aggregate
Operator
aggregate — perform aggregations
Synopsis
[aggregate] [<field>:=]<agg>
[aggregate] [<field>:=]<agg> [where <expr>][, [<field>:=]<agg> [where <expr>] ...]
[aggregate] [<field>:=]<agg> [by [<field>][:=<expr>][, [<field>][:=<expr>]] ...]
[aggregate] [<field>:=]<agg> [where <expr>][, [<field>:=]<agg> [where <expr>] ...] [by [<field>][:=<expr>][, [<field>][:=<expr>]] ...]
[aggregate] by [<field>][:=<expr>][, [<field>][:=<expr>] ...]
Description
In the first four forms, the aggregate
operator consumes all of its input,
applies an aggregate function to each input value
optionally filtered by a where
clause and/or organized with the grouping
keys specified after the by
keyword, and at the end of input produces one
or more aggregations for each unique set of grouping key values.
In the final form, aggregate
consumes all of its input, then outputs each
unique combination of values of the grouping keys specified after the by
keyword.
The aggregate
keyword is optional since it is an
implied operator.
Each aggregate function may be optionally followed by a where
clause, which
applies a Boolean expression that indicates, for each input value,
whether to deliver it to that aggregate. (where
clauses are analogous
to the where
operator.)
The output field names for each aggregate and each key are optional. If omitted,
a field name is inferred from each right-hand side, e.g, the output field for the
count
aggregate function is simply count
.
A key may be either an expression or a field. If the key field is omitted,
it is inferred from the expression, e.g., the field name for by lower(s)
is lower
.
When the result of aggregate
is a single value (e.g., a single aggregate
function without grouping keys) and there is no field name specified, then
the output is that single value rather than a single-field record
containing that value.
If the cardinality of grouping keys causes the memory footprint to exceed a limit, then each aggregate’s partial results are spilled to temporary storage and the results merged into final results using an external merge sort. The same mechanism that spills to storage can also spill across the network to a cluster of workers in an adaptive shuffle, though this is not yet implemented.
Examples
Average the input sequence:
aggregate avg(this)
1
2
3
4
Loading...
echo '1
2
3
4' \
| super -z -c 'aggregate avg(this)' -
To format the output of a single-valued aggregation into a record, simply specify an explicit field for the output:
aggregate mean:=avg(this)
1
2
3
4
Loading...
echo '1
2
3
4' \
| super -z -c 'aggregate mean:=avg(this)' -
When multiple aggregate functions are specified, even without explicit field names, a record result is generated with field names implied by the functions:
aggregate avg(this),sum(this),count()
1
2
3
4
Loading...
echo '1
2
3
4' \
| super -z -c 'aggregate avg(this),sum(this),count()' -
Sum the input sequence, leaving out the aggregate
keyword:
sum(this)
1
2
3
4
Loading...
echo '1
2
3
4' \
| super -z -c 'sum(this)' -
Create integer sets by key and sort the output to get a deterministic order:
set:=union(v) by key:=k | sort
{k:"foo",v:1}
{k:"bar",v:2}
{k:"foo",v:3}
{k:"baz",v:4}
Loading...
echo '{k:"foo",v:1}
{k:"bar",v:2}
{k:"foo",v:3}
{k:"baz",v:4}' \
| super -z -c 'set:=union(v) by key:=k | sort' -
Use a where
clause:
set:=union(v) where v > 1 by key:=k | sort
{k:"foo",v:1}
{k:"bar",v:2}
{k:"foo",v:3}
{k:"baz",v:4}
Loading...
echo '{k:"foo",v:1}
{k:"bar",v:2}
{k:"foo",v:3}
{k:"baz",v:4}' \
| super -z -c 'set:=union(v) where v > 1 by key:=k | sort' -
Use separate where
clauses on each aggregate function:
set:=union(v) where v > 1,
array:=collect(v) where k=="foo"
by key:=k
| sort
{k:"foo",v:1}
{k:"bar",v:2}
{k:"foo",v:3}
{k:"baz",v:4}
Loading...
echo '{k:"foo",v:1}
{k:"bar",v:2}
{k:"foo",v:3}
{k:"baz",v:4}' \
| super -z -c 'set:=union(v) where v > 1,
array:=collect(v) where k=="foo"
by key:=k
| sort' -
Results are included for by
groupings that generate null results when where
clauses are used inside aggregate
:
sum(v) where k=="bar" by key:=k | sort
{k:"foo",v:1}
{k:"bar",v:2}
{k:"foo",v:3}
{k:"baz",v:4}
Loading...
echo '{k:"foo",v:1}
{k:"bar",v:2}
{k:"foo",v:3}
{k:"baz",v:4}' \
| super -z -c 'sum(v) where k=="bar" by key:=k | sort' -
To avoid null results for by
groupings as just shown, filter before aggregate
:
k=="bar" | sum(v) by key:=k | sort
{k:"foo",v:1}
{k:"bar",v:2}
{k:"foo",v:3}
{k:"baz",v:4}
Loading...
echo '{k:"foo",v:1}
{k:"bar",v:2}
{k:"foo",v:3}
{k:"baz",v:4}' \
| super -z -c 'k=="bar" | sum(v) by key:=k | sort' -
Output just the unique key values:
by k | sort
{k:"foo",v:1}
{k:"bar",v:2}
{k:"foo",v:3}
{k:"baz",v:4}
Loading...
echo '{k:"foo",v:1}
{k:"bar",v:2}
{k:"foo",v:3}
{k:"baz",v:4}' \
| super -z -c 'by k | sort' -