Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Operator

aggregate — execute aggregate functions with optional grouping expressions

Synopsis

[aggregate] <agg> [, <agg> ... ] [ by <grouping> [, <grouping> ... ] ]
[aggregate] by <grouping> [, <grouping> ... ]

where <agg> references an aggregate function optionally structured as a field assignment having the form:

[ <field> := ] <agg-func> ( [ all | distinct ] <expr> ) [ where <pred> ]

and <grouping> is a grouping expression field assignment having the form:

[ <field> := ] <expr>

Description

The aggregate operator applies aggregate functions to partitioned groups of its input values to reduce each group to one output value where the result of each aggregate function appears as a field of the result.

Each group corresponds to the unique values of the <grouping> expressions. When there are no <grouping> expressions, the aggregate functions are applied to the entire input optionally filtered by <pred>.

In the first form, the aggregate operator consumes all of its input, applies one or more aggregate functions <agg> to each input value optionally filtered by a where clause and/or organized with the grouping expressions specified after the by keyword, and at the end of input produces one or more aggregations for each unique set of grouping key values.

In the second form, aggregate consumes all of its input, then outputs each unique combination of values of the grouping expressions specified after the by keyword without applying any aggregate functions.

The aggregate keyword is optional since it can be used as a shortcut.

Each aggregate function <agg-func> may be optionally followed by a where clause, which applies a Boolean expression <pred> that indicates, for each input value, whether to include it in the values operated upon by the aggregate function. where clauses are analogous to the where operator but apply their filter to the input argument stream to the aggregate function.

The output values are records formed from the field assignments first from the grouping expressions then from the aggregate functions in left-to-right order.

When the result of aggregate is a single value (e.g., a single aggregate function without grouping expressions or a single grouping expression without aggregates) and there is no field name specified, then the output is that single value rather than a single-field record containing that value.

If the cardinality of grouping expressions causes the memory footprint to exceed a limit, then each aggregate’s partial results are spilled to temporary storage and the results merged into final results using an external merge sort.

Note

Spilling is not yet implemented for the vectorized runtime.

Examples


Average the input sequence

# spq
aggregate avg(this)
# input
1
2
3
4
# expected output
2.5

To format the output of a single-valued aggregation into a record, simply specify an explicit field for the output

# spq
aggregate mean:=avg(this)
# input
1
2
3
4
# expected output
{mean:2.5}

When multiple aggregate functions are specified, even without explicit field names, a record result is generated with field names implied by the functions

# spq
aggregate avg(this),sum(this),count()
# input
1
2
3
4
# expected output
{avg:2.5,sum:10,count:4::uint64}

Sum the input sequence, leaving out the aggregate keyword

# spq
sum(this)
# input
1
2
3
4
# expected output
10

Create integer sets by key and sort the output to get a deterministic order

# spq
set:=union(v) by key:=k | sort
# input
{k:"foo",v:1}
{k:"bar",v:2}
{k:"foo",v:3}
{k:"baz",v:4}
# expected output
{key:"bar",set:|[2]|}
{key:"baz",set:|[4]|}
{key:"foo",set:|[1,3]|}

Use a where clause

# spq
set:=union(v) where v > 1 by key:=k | sort
# input
{k:"foo",v:1}
{k:"bar",v:2}
{k:"foo",v:3}
{k:"baz",v:4}
# expected output
{key:"bar",set:|[2]|}
{key:"baz",set:|[4]|}
{key:"foo",set:|[3]|}

Use a separate where clause on each aggregate function

# spq
set:=union(v) where v > 1,
array:=collect(v) where k=="foo"
  by key:=k
| sort
# input
{k:"foo",v:1}
{k:"bar",v:2}
{k:"foo",v:3}
{k:"baz",v:4}
# expected output
{key:"bar",set:|[2]|,array:null}
{key:"baz",set:|[4]|,array:null}
{key:"foo",set:|[3]|,array:[1,3]}

Results are included for by groupings that generate null results when where clauses are used inside aggregate

# spq
sum(v) where k=="bar" by key:=k | sort
# input
{k:"foo",v:1}
{k:"bar",v:2}
{k:"foo",v:3}
{k:"baz",v:4}
# expected output
{key:"bar",sum:2}
{key:"baz",sum:null}
{key:"foo",sum:null}

To avoid null results for by groupings as just shown, filter before aggregate

# spq
k=="bar" | sum(v) by key:=k | sort
# input
{k:"foo",v:1}
{k:"bar",v:2}
{k:"foo",v:3}
{k:"baz",v:4}
# expected output
{key:"bar",sum:2}

Output just the unique key values

# spq
by k | sort
# input
{k:"foo",v:1}
{k:"bar",v:2}
{k:"foo",v:3}
{k:"baz",v:4}
# expected output
"bar"
"baz"
"foo"