Subqueries

A subquery is a query embedded in an expression.

When the expression containing the subquery is evaluated, the query is run with an input consisting of a single value equal to the value being evaluated by the expression.

The syntax for a subquery is simply a query in parentheses as in

( <query> )

where <query> is any query, e.g., the query

values {s:(values "hello, world" | upper(this))}

results in in the value {s:"HELLO, WORLD"}.

Except for subqueries appearing as the right-hand side of an in operator, the result of a subquery must be a single value. When multiple values are generated, an error is produced.

For the in operator, any subquery on the right-hand side is always treated as an array subquery, thus providing compatibility with SQL syntax.

Array Subqueries

When multiple values are expected, an array subquery can be used to group the multi-valued result into a single-valued array.

The syntax for an array subquery is simply a query in square brackets as in

[ <query> ]

where <query> is any query, e.g., the query

values {a:[values 1,2,3 | values this+1]}

results in the value {a:[2,3,4]}.

An array subquery is shorthand for

( <query> | collect(this) )

e.g., the array subquery above could also be rewritten as

values {a:(values 1,2,3 | values this+1 | collect(this))}

Independent Subqueries

A subquery that depends on its input as described above is called a dependent subquery.

When the subquery ignores its input value, e.g., when it begins with a from operator, then they query is called an independent subquery.

For efficiency, the system materializes independent subqueries so that they are evaluated just once.

For example, this query

let input = (values 1,2,3)
values 3,4
| values {that:this,count:(from input | count())}

evaluates the subquery from input | count() just once and materializes the result. Then, for each input value 3 and 4, the result is emitted, e.g.,

{that:3,count:3}
{that:4,count:3}

When a subquery appears within a SQL operator, relational scope is active and references to table aliases and columns may reach a scope that is outside of the subquery. In this case, the subquery is a correlated subquery.

Correlated subqueries are not yet supported. They are detected and a compile-time error is reported when encountered.

A correlated subquery can always be rewritten as a pipe subquery using unnest using this pattern:

unnest {outer:this,inner:[<query>]}

where <query> generates the correlated subquery values, then they can be accessed as if the outer field is the outer scope and the inner field is the subquery scope.

Named Subqueries

When a previously declared named query is referenced in an expression, it is automatically evaluated as a subquery, e.g.,

let q = (values 1,2,3 | max(this))
values q+1

outputs the value 4.

When a named query is expected to return multiple values, it should be referenced as an array subquery, e.g.,

let q = (values 1,2,3)
values [q]

outputs the value [1,2,3].

Recursive Subqueries

When subqueries are combined with recursive invocation of the function they appear in, some powerful patterns can be constructed.

For example, the visitor-walk pattern can be implemented using recursive subqueries and function values.

Here’s a template for walk:

fn walk(node, visit):
  case kind(node)
  when "array" then
    [unnest node | walk(this, visit)]
  when "record" then
    unflatten([unnest flatten(node) | {key,value:walk(value, visit)}])
  when "union" then
    walk(under(node), visit)
  else visit(node)
  end

Note

In this case, we are traversing only records and arrays. Support for flattening and unflattening maps and sets is forthcoming.

Here, walk is invoking an array subquery on the unnested entities (records or arrays), calling the walk function recursively on each item, then assembling the results back into an array (i.e., the raw result of the array subquery) or a record (i.e., calling unflatten on the key/value pairs returned in the array).

If we call walk with this function on an arbitrary nested value

fn addOne(node): case typeof(node) when <int64> then node+1 else node end

then each leaf value of the nested value of type int64 would be incremented while the other leaves would be left alone. See the example below.

Examples

Operate on arrays with values shortcuts and arrange answers into a record

# spq
values {
    squares:[unnest this | this*this],
    roots:[unnest this | round(sqrt(this)*100)*0.01]
}
# input
[1,2,3]
[4,5]
# expected output
{squares:[1,4,9],roots:[1.,1.41,1.73]}
{squares:[16,25],roots:[2.,2.24]}

Multi-valued subqueries emit an error

# spq
values (values 1,2,3)
# input

# expected output
error("query expression produced multiple values (consider [subquery])")

Multi-valued subqueries can be invoked as an array subquery

# spq
values [values 1,2,3]
# input

# expected output
[1,2,3]

Right-hand side of “in” operator is always an array subquery

# spq
let data = (values {x:1},{x:2})
where this in (select x from data)
# input
1
2
3
# expected output
1
2

Independent subqueries in SQL operators are supported while correlated subqueries are not

# spq
let input = (values {x:1},{x:2},{x:3})
select x
from input
where x >= (select avg(x) from input)  
# input

# expected output
{x:2}
{x:3}

Correlated subqueries in SQL operators not yet supported

# spq
select x
from (values (1),(2)) a(x)
where exists (
  select 1
  from (values (3),(4)) b(y)
  where x=y
)
# input

# expected output
correlated subqueries not currently supported at line 6, column 9:
  where x=y
        ~

Recursive subqueries inside function implementing the walk-visitor pattern

# spq
fn walk(node, visit):
  case kind(node)
  when "array" then
    [unnest node | walk(this, visit)]
  when "record" then
    unflatten([unnest flatten(node) | {key,value:walk(value, visit)}])
  when "union" then
    walk(under(node), visit)
  else visit(node)
  end
fn addOne(node): case typeof(node) when <int64> then node+1 else node end
values walk(this, &addOne)
# input
1
[1,2,3]
[{x:[1,"foo"]},{y:2}]
# expected output
2
[2,3,4]
[{x:[2,"foo"]},{y:3}]

Keyboard shortcuts