from
Operator
from — source data from pools, files, or URIs
Synopsis
from <pool>[@<commitish>]
from <pattern>
Description
The from
operator identifies one or more data sources and transmits
their data to its output. A data source can be
- the name of a data pool in a SuperDB lake, with optional commitish;
- the names of multiple data pools, expressed as a regular expression or glob pattern;
- a path to a file;
- an HTTP, HTTPS, or S3 URI; or
- the
pass
operator, to treat the upstream pipeline branch as a source.
✵ Note ✵
File paths and URIs may be followed by an optional format specifier.
Sourcing data from pools is only possible when querying a lake, such as
via the super db
command or
SuperDB lake API. Sourcing data from files is only possible
with the super
command.
When a single pool name is specified without @
-referencing a commit or ID, or
when using a pool pattern, the tip of the main
branch of each pool is
accessed.
In the first four forms, a single source is connected to a single output. In the fifth form, multiple sources are accessed in parallel and may be joined, combined, or merged.
A pipeline can be split with the fork
operator as in
from PoolOne | fork
( op1 | op2 | ... )
( op1 | op2 | ... )
| merge ts | ...
Or multiple pools can be accessed and, for example, joined:
fork
( from PoolOne | op1 | op2 | ... )
( from PoolTwo | op1 | op2 | ... )
| join on key=key | ...
Similarly, data can be routed to different pipeline branches with replication
using the switch
operator:
from ... | switch color
case "red" ( op1 | op2 | ... )
case "blue" ( op1 | op2 | ... )
default ( op1 | op2 | ... )
| ...
Input Data
Examples below below assume the existence of the SuperDB lake created and populated by the following commands:
export SUPER_DB=example
super db -q init
super db -q create -orderby flip:desc coinflips
echo '{flip:1,result:"heads"} {flip:2,result:"tails"}' |
super db load -q -use coinflips -
super db branch -q -use coinflips trial
echo '{flip:3,result:"heads"}' | super db load -q -use coinflips@trial -
super db -q create numbers
echo '{number:1,word:"one"} {number:2,word:"two"} {number:3,word:"three"}' |
super db load -q -use numbers -
super db -f text -c '
from :branches
| values pool.name + "@" + branch.name
| sort'
The lake then contains the two pools:
coinflips@main
coinflips@trial
numbers@main
The following file hello.sup
is also used.
{greeting:"hello world!"}
Examples
Source structured data from a local file
super -s -c 'from hello.sup | values greeting'
=>
"hello world!"
Source data from a local file, but in line format
super -s -c 'from hello.sup format line'
=>
"{greeting:\"hello world!\"}"
Source structured data from a URI
super -s -c 'get https://raw.githubusercontent.com/brimdata/zui-insiders/main/package.json
| values productName'
=>
"Zui - Insiders"
Source data from the main
branch of a pool
super db -db example -s -c 'from coinflips'
=>
{flip:2,result:"tails"}
{flip:1,result:"heads"}
Source data from a specific branch of a pool
super db -db example -s -c 'from coinflips@trial'
=>
{flip:3,result:"heads"}
{flip:2,result:"tails"}
{flip:1,result:"heads"}
Count the number of values in the main
branch of all pools
super db -db example -f text -c 'from * | count()'
=>
5
Join the data from multiple pools
super db -db example -s -c '
from coinflips | sort flip
| join (
from numbers | sort number
) on left.flip=right.number
| values {...left, word:right.word}'
=>
{flip:1,result:"heads",word:"one"}
{flip:2,result:"tails",word:"two"}
Use pass
to combine our join output with data from yet another source
super db -db example -s -c '
from coinflips | sort flip
| join (
from numbers | sort number
) on left.flip=right.number
| values {...left, word:right.word}
| fork
( pass )
( from coinflips@trial
| c:=count()
| values f"There were {int64(c)} flips" )
| sort this'
=>
"There were 3 flips"
{flip:1,result:"heads",word:"one"}
{flip:2,result:"tails",word:"two"}