Operator
from — source data from databases, files, or URLs
Synopsis
from <file> [ ( format <fmt> ) ]
from <pool> [@<commit>]
from <url> [ ( format <fmt> method <method> headers <expr> body <string> ) ]
from eval(<expr>) [ ( format <fmt> method <id> headers <expr> body <string> ) ]
Description
The from operator identifies one or more sources of data as input to
a query and transmits that data to its output.
It has two forms:
- a
frompipe operator with pipe scoping as described here, or - a SQL
FROMclause with relational scoping.
As a pipe operator,
from preserves the order of the data within a file,
URL, or a sorted pool but when multiple sources are identified,
the data may be read in parallel and interleaved in an undefined order.
Optional arguments to from may be appended as a parenthesized concatenation
of arguments.
When reading from sources external to a database (e.g., URLs or files), the format of each data source is automatically detected using heuristics. To manually specify the format of a source and override the autodetection heuristic, a format argument may be appended as an argument and has the form
format <fmt>
where <fmt> is the name of a supported
serialization format and is
parsed as a text entity.
When from references a file or URL entity whose name ends in a
well-known extension
(e.g., .json, .sup, etc.), auto-detection is disabled and the
format is implied by the extension name.
File-System Operation
When running detached from a database, the target of from
is either a
text entity
or a file system glob.
If a text entity is parseable as an HTTP or HTTPS URL,
then the target is presumed to be a URL and is processed
accordingly. Otherwise, the target is assumed to be a file
in the file system whose path is relative to the directory
in which the super command is running.
If the target is a glob, then the glob is expanded and the files are processed in an undefined order. Any operator arguments specified after a glob target are applied to all of the matched files.
Here are a few examples illustrating file references:
from "file.sup"
from file.json
from file*.parq (format parquet)
Database Operation
When running attached to a database (i.e., using super db),
the target of from is either a
text entity
or a regular expression
or glob that matches pool names.
If a text entity is parseable as an HTTP or HTTPS URL, then the target is presumed to be a URL and is processed accordingly. Otherwise, the target is assumed to be the name of a pool in the attached database.
Local files are not accessible when attached to a database.
Note that pool names and file names have similar syntax in from but
their use is disambiguated by the presence or absence of an attached
database.
When multiple data pools are referenced with a glob or regular expression, they are scanned in an undefined order.
The reference string for a pool may also be appended with an @-style
commitish, which specifies that
data is sourced from a specific commit in a pool’s commit history.
When a single pool name is specified without an @ reference, or
when using a glob or regular expression, the tip of the main
branch of each pool is accessed.
The format argument is not valid with a database source.
Note
Metadata from database pools also may be sourced using
from. This will be documented in a future release of SuperDB.
URL
Data sources identified by URLs can be accessed either when attached or detached from a database.
When the <url> argument begins with http: or https:
and has the form of a valid URL, then the source is fetched remotely using the
indicated protocol.
As a text entity, typical URLs need not be quoted though URLs with special characters must be quoted.
A format argument may be appended to a URL reference.
Other valid operator arguments control the body and headers of the HTTP request that implement the data retrieval and include:
- method
<method> - headers
<expr> - body
<string>
where
<method>is one ofGET,PUT,POST, orDELETE,<expr>is a record expression that defines the names and values to be included as HTTP header options, and<body>is a text-entity string to be included as the body of the HTTP request.
Currently, the headers expression must evaluate to a compile-time constant though this may change to allow dynamic computation in a future version of SuperSQL.
Expression
The eval() form of from provides a means to read data programmatically from
sources based on the <expr> argument to eval, which should return
a value of type string.
In this case, from reads values from its parent, applies <expr> to each
value, and interprets the string result as a target to be processed.
Each string value is interpreted as a from target and must be a file path
(when running detached from a database), a pool name (when attached to a database),
or a URL forming a sequence of targets which are read and output by the
from operator in the order encountered.
Combining Data
To combine data from multiple sources using pipe operators, from may be
used in combination with other operators like fork and join.
For example, multiple pools can be accessed in parallel and combined in undefined order:
fork
( from PoolOne | op1 | op2 | ... )
( from PoolTwo | op1 | op2 | ... )
| ...
or joined according to a join condition:
fork
( from PoolOne | op1 | op2 | ... )
( from PoolTwo | op1 | op2 | ... )
| join as {left,right} on left.key=right.key
| ...
Alternatively, the right-hand leg of the join may be written as a subquery of join:
from PoolOne | op1 | op2 | ...
| join ( from PoolTwo | op1 | op2 | ... )
as {left,right} on left.key=right.key
| ...
File Examples
Source structured data from a local file
echo '{greeting:"hello world!"}' > hello.sup
super -s -c 'from hello.sup | values greeting'
=>
"hello world!"
Source data from a local file, but in “line” format
super -s -c 'from hello.sup (format line)'
=>
"{greeting:\"hello world!\"}"
HTTP Example
Source data from a URL
super -s -c 'from https://raw.githubusercontent.com/brimdata/super/main/package.json
| values name'
=>
"super"
Database Examples
The remaining examples below assume the existence of the SuperDB database created and populated by the following commands:
export SUPER_DB=example
super db -q init
super db -q create -orderby flip:desc coinflips
echo '{flip:1,result:"heads"} {flip:2,result:"tails"}' |
super db load -q -use coinflips -
super db branch -q -use coinflips trial
echo '{flip:3,result:"heads"}' | super db load -q -use coinflips@trial -
super db -q create numbers
echo '{number:1,word:"one"} {number:2,word:"two"} {number:3,word:"three"}' |
super db load -q -use numbers -
super db -f text -c '
from :branches
| values pool.name + "@" + branch.name
| sort'
The database then contains the two pools and three branches:
coinflips@main
coinflips@trial
numbers@main
The following file hello.sup is also used.
{greeting:"hello world!"}
Source data from the main branch of a pool
super db -db example -s -c 'from coinflips'
=>
{flip:2,result:"tails"}
{flip:1,result:"heads"}
Source data from a specific branch of a pool
super db -db example -s -c 'from coinflips@trial'
=>
{flip:3,result:"heads"}
{flip:2,result:"tails"}
{flip:1,result:"heads"}
Count the number of values in the main branch of all pools
super db -db example -f text -c 'from * | count()'
=>
5
Join the data from multiple pools
super db -db example -s -c '
from coinflips
| join ( from numbers ) on left.flip=right.number
| values {...left, word:right.word}
| sort'
=>
{flip:1,result:"heads",word:"one"}
{flip:2,result:"tails",word:"two"}
Use pass to combine our join output with data from yet another source
super db -db example -s -c '
from coinflips
| join ( from numbers ) on left.flip=right.number
| values {...left, word:right.word}
| fork
( pass )
( from coinflips@trial
| c:=count()
| values f"There were {c} flips" )
| sort this'
=>
"There were 3 flips"
{flip:1,result:"heads",word:"one"}
{flip:2,result:"tails",word:"two"}
Expression Example
Read from dynamically defined files and add a column
echo '{a:1}{a:2}' > a.sup
echo '{b:3}{b:4}' > b.sup
echo '"a.sup" "b.sup"' | super -s -c "from eval(this) | c:=coalesce(a,b)+1" -
=>
{a:1,c:2}
{a:2,c:3}
{b:3,c:4}
{b:4,c:5}