Marigold 🏵️

  1. crates.io version
  2. documentation
  3. lines of code
  4. contributors
  5. github (last commit)
  6. github (ci check)
  7. github (ci check)
  8. github (ci check)
  9. github (ci check)

Marigold is an imperative, domain-specific language for data pipelining and analysis using async streams. It can be used as a standalone language or within Rust programs.

Examples

Marigold Script

This is a standalone Marigold program. Marigold reads a CSV file containing rows with two columns, name and hull_configuration:

enum Hull {
  Spherical = "spherical",
  Split = "split",
}

struct Ship {
  name: string_8,
  hull_configuration: Hull,
}

fn is_spherical(s: &Ship) -> bool {
  s.hull_configuration == Hull::Spherical
}

read_file("ships.csv.gz", csv, struct=Ship)
  .ok_or_panic()
  .filter(is_spherical)
  .to_file("spherical.csv.gz", csv)

By default, CSV files that terminate in .gz are considered gzipped for both reading and writing: the compressed CSV input is parsed as Ship objects. The Ships are filtered, and those with spherical hulls are written to the output file in gzipped CSV format.

Marigold in Rust

Rust applications can use Marigold in a macro, m!. Marigold integrates into the parent application, accepting Rust structs, enums, and functions. Note that familiar, heap-allocating types, such as String, work with the same grammar:

use futures::stream::StreamExt;
use marigold::m;
use serde::Deserialize;
use serde::Serialize;

#[derive(
  Eq,
  PartialEq,
  Serialize,
  Deserialize,
  Debug
)]
struct Ship {
  class: String,
  hull: String,
}

fn is_spherical(ship: &Ship) -> bool {
  match ship.hull {
    "spherical" => true,
    _ => false
  }
}

#[tokio::main]
async fn main() {
  let ships = m!(
    read_file(
      "ships.csv",
      csv,
      struct=Ship
    )
      .ok_or_panic()
      .filter(is_spherical)
      .return
  )
    .await
    .collect::<Vec<_>>()
    .await;

  println!(
    "Best classes: {:?}",
    ships
  );
}

Language Features

Marigold is a domain-specific language for operating on streams of data. It provides a readable, parallelism-by-default grammar, as well as implicit de/serialization and de/compression while reading from network or I/O sources. It compiles to Rust source code, and so benefits from Rust's performance and memory safety. It is open source and dual-licensed under Apache-2.0 and MIT.

As a standalone language, Marigold uses a popular multi-threaded work-stealing asynchronous runtime from the Rust ecosystem (currently Tokio), allowing for efficient parallelism.

When integrated into a Rust program, Marigold is runtime-agnostic by default. It runs a single future (non-parallel) that can be processed by any Rust async runtime. To support spawning new futures (and, for multi-threaded runtimes, parallelism), the only configuration necessary is activating the runtime-specific library feature for the Marigold dependency in the Rust project's Cargo.toml. Currently, Tokio and Async-std are supported.

Marigold uses a subset of the Rust type system while declaring classes. All of the object definitions have a fixed size and implement the Copy trait in Rust, meaning that they can be duplicated merely by copying the contents of the stack allocation for the object. Fixed-sized objects, combined with immutable data, have performance benefits and facilitate both parallelism and multi-consumer streams.

Marigold streams can have multiple consumers. When any consumer's input buffer is full, the stream stops processing new data until it can write to its consumers again (backpressure). This tight coupling avoids memory spikes.

Sample Projects