Window Functions
Windows functions let you perform calculations across a set of rows that are related to the current row. Some of the calculations that you can do are similar to those that can be done with an aggregate function, but a window function doesn't cause rows to be grouped into a single output - the individual rows are still returned.
Standard Window Functions
ClickHouse supports the standard grammar for defining windows and window functions. The table below indicates whether a feature is currently supported.
Feature | Supported? |
---|---|
ad hoc window specification (count(*) over (partition by id order by time desc) ) | ✅ |
expressions involving window functions, e.g. (count(*) over ()) / 2) | ✅ |
WINDOW clause (select ... from table window w as (partition by id) ) | ✅ |
ROWS frame | ✅ |
RANGE frame | ✅ (the default) |
INTERVAL syntax for DateTime RANGE OFFSET frame | ❌ (specify the number of seconds instead (RANGE works with any numeric type).) |
GROUPS frame | ❌ |
Calculating aggregate functions over a frame (sum(value) over (order by time) ) | ✅ (All aggregate functions are supported) |
rank() , dense_rank() , row_number() | ✅ Alias: denseRank() |
percent_rank() | ✅ Efficiently computes the relative standing of a value within a partition in a dataset. This function effectively replaces the more verbose and computationally intensive manual SQL calculation expressed as ifNull((rank() OVER(PARTITION BY x ORDER BY y) - 1) / nullif(count(1) OVER(PARTITION BY x) - 1, 0), 0) Alias: percentRank() |
lag/lead(value, offset) | ❌ You can use one of the following workarounds: 1) any(value) over (.... rows between <offset> preceding and <offset> preceding) , or following for lead 2) lagInFrame/leadInFrame , which are analogous, but respect the window frame. To get behavior identical to lag/lead , use rows between unbounded preceding and unbounded following |
ntile(buckets) | ✅ Specify window like, (partition by x order by y rows between unbounded preceding and unrounded following). |
ClickHouse-specific Window Functions
There is also the following ClickHouse specific window function:
nonNegativeDerivative(metric_column, timestamp_column[, INTERVAL X UNITS])
Finds non-negative derivative for given metric_column
by timestamp_column
.
INTERVAL
can be omitted, default is INTERVAL 1 SECOND
.
The computed value is the following for each row:
0
for 1st row,- for row.
Syntax
PARTITION BY
- defines how to break a resultset into groups.ORDER BY
- defines how to order rows inside the group during calculation aggregate_function.ROWS or RANGE
- defines bounds of a frame, aggregate_function is calculated within a frame.WINDOW
- allows multiple expressions to use the same window definition.
Functions
These functions can be used only as a window function.
row_number()
- Number the current row within its partition starting from 1.first_value(x)
- Return the first value evaluated within its ordered frame.last_value(x)
- Return the last value evaluated within its ordered frame.nth_value(x, offset)
- Return the first non-NULL value evaluated against the nth row (offset) in its ordered frame.rank()
- Rank the current row within its partition with gaps.dense_rank()
- Rank the current row within its partition without gaps.lagInFrame(x)
- Return a value evaluated at the row that is at a specified physical offset row before the current row within the ordered frame.leadInFrame(x)
- Return a value evaluated at the row that is offset rows after the current row within the ordered frame.
Examples
Let's have a look at some examples of how window functions can be used.
Numbering rows
Aggregation functions
Compare each player's salary to the average for their team.
Compare each player's salary to the maximum for their team.
Partitioning by column
Frame bounding
Real world examples
The following examples solve common real-world problems.
Maximum/total salary per department
Cumulative sum
Moving / Sliding Average (per 3 rows)
Moving / Sliding Average (per 10 seconds)
Moving / Sliding Average (per 10 days)
Temperature is stored with second precision, but using Range
and ORDER BY toDate(ts)
we form a frame with the size of 10 units, and because of toDate(ts)
the unit is a day.
References
GitHub Issues
The roadmap for the initial support of window functions is in this issue.
All GitHub issues related to window functions have the comp-window-functions tag.
Tests
These tests contain the examples of the currently supported grammar:
https://github.com/ClickHouse/ClickHouse/blob/master/tests/performance/window_functions.xml
Postgres Docs
https://www.postgresql.org/docs/current/sql-select.html#SQL-WINDOW
https://www.postgresql.org/docs/devel/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS
https://www.postgresql.org/docs/devel/functions-window.html
https://www.postgresql.org/docs/devel/tutorial-window.html
MySQL Docs
https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html
https://dev.mysql.com/doc/refman/8.0/en/window-functions-usage.html
https://dev.mysql.com/doc/refman/8.0/en/window-functions-frames.html