fuse

Table of Contents

Operator

fuse — coerce all input values into a merged type

Synopsis

fuse

Description

The fuse operator reads all of its input, computes an “intelligent merge” of varied types in the input, then adjusts each output value to conform to the merged type.

The merged type is constructed intelligently in the sense that type {a:string} and {b:string} is fused into type {a:string,b:string} instead of the union type ({a:string},{b:string}).

TBD: document the algorithm here in more detail. The operator takes no paramters but we are experimenting with ways to control how field with the same name but different types are merged especially in light of complex types like arrays, sets, and so forth.

Because all values of the input must be read to compute the union, fuse may spill its input to disk when memory limits are exceeded.

Fuse is not normally needed for super-structured data as the super data model supports heterogenous sequences of values. However, fuse can be quite useful during data exploration when sampling or filtering data to look at slices of raw data that are fused together. Fuse is also useful for transforming arbitrary super data to prepare it for formats that require a uniform schema like Parquet or a tabular structure like CSV. Unfortunately, when data leaves a super-structured format using fuse to accomplish this, the original data must be altered to fit into the rigid structure of these output formats.

A fused type over many heterogeneous values also represents a typical design pattern of a data warehouse where a single very-wide schema defines slots for all possible input values where the columns are sparsely populated by each row value as the missing columns are set to null. Super data is super-structured, and fortunately, does not require such a structure.

Examples

Fuse two records

Error:
echo '{a:1}{b:2}' | super -z -c fuse -

Fuse records with type variation

Error:
echo '{a:1}{a:"foo"}' | super -z -c fuse -

Fuse records with complex type variation

Error:
echo '{a:[1,2]}{a:["foo","bar"],b:10.0.0.1}' | super -z -c fuse -

The table format clarifies what fuse does

Error:
echo '{a:1}{b:2}{a:3}' | super -f table -c fuse -
Next: get

SuperDB