Marigold is an imperative, domain-specific language for data pipelining and analysis using async streams. It can be used as a standalone language or within Rust programs.
This is a standalone Marigold program. Marigold reads a CSV file containing rows with two columns,
read_file .ok_or_panic .filter .to_file
By default, CSV files that terminate in
.gz are considered gzipped for both reading and writing: the compressed CSV input is parsed as Ship objects. The Ships are filtered, and those with spherical hulls are written to the output file in gzipped CSV format.
Marigold in Rust
Rust applications can use Marigold in a macro,
m!. Marigold integrates into the parent application, accepting Rust structs, enums, and functions. Note that familiar, heap-allocating types, such as String, work with the same grammar:
use StreamExt; use m; use Deserialize; use Serialize; async
- fully async network/IO
- built-in serialization and compression
- pure functions
- immutable, stack-allocatable objects
Marigold is a domain-specific language for operating on streams of data. It provides a readable, parallelism-by-default grammar, as well as implicit de/serialization and de/compression while reading from network or I/O sources. It compiles to Rust source code, and so benefits from Rust's performance and memory safety. It is open source and dual-licensed under Apache-2.0 and MIT.
As a standalone language, Marigold uses a popular multi-threaded work-stealing asynchronous runtime from the Rust ecosystem (currently Tokio), allowing for efficient parallelism.
When integrated into a Rust program, Marigold is runtime-agnostic by default. It runs a single future (non-parallel) that can be processed by any Rust async runtime. To support spawning new futures (and, for multi-threaded runtimes, parallelism), the only configuration necessary is activating the runtime-specific library feature for the Marigold dependency in the Rust project's
Cargo.toml. Currently, Tokio and Async-std are supported.
Marigold uses a subset of the Rust type system while declaring classes. All of the object definitions have a fixed size and implement the
Copy trait in Rust, meaning that they can be duplicated merely by copying the contents of the stack allocation for the object. Fixed-sized objects, combined with immutable data, have performance benefits and facilitate both parallelism and multi-consumer streams.
Marigold streams can have multiple consumers. When any consumer's input buffer is full, the stream stops processing new data until it can write to its consumers again (backpressure). This tight coupling avoids memory spikes.