Preventing a Class of Deadlocks in Marigold

This is a post about Marigold, a programming language for operating on async streams that compiles to Rust.

Different programming languages support asynchronous code execution using different models. In Javascript, futures are added to the event loop where they run concurrently "in the background" of the main process. In Erlang and Go, asynchronous computation occurs across green threads. In Rust, futures are only polled by the executor when they are awaited: a future that is never used is never run.¹

This exposes Rust applications to a class of deadlocks that do not exist in Javascript- or Erlang- style executors. In any async language, if future A relies on future B, and the program polls A but not B, then the program will hang. In a language like Rust, where futures are only polled when they are awaited, it is much easier to poll a future A and never poll future B. A polybdenum blog post describes how some foundational async Rust APIs are prone to this class of error.

The function partition_by_hash from Marigold's internals is prone to this kind of deadlock. It takes a single stream and returns a stream of streams, splitting the inputs from the original stream according to the output of a partitioning function. Using bounded channels, the program introduces backpressure on the parent stream if any child stream stops accepting new input.

If called without select! or task::spawn, the program will deadlock as soon as multiple values with the same hash are recieved, like in this example:

let mut partitioned_stream =
  futures::stream::iter(0..=255)
    .partition_by_hash(|&i| i % 10);
while let Some(partition) = partitioned_stream
.next()
.await {
partition_futs.push(
partition
.collect::<Vec<_>>()
.await
);
}

The deadlock occurs because we write each datum to a child stream, and wait to see if the datum has a new hash, causing a new child stream to be produced. If the substream's buffer is full, then we wait until a value from the buffer is taken and processed. But, since we are waiting to see if we receive a new child stream (the "future A" in this deadlock), we cannot empty the child stream's full buffer ("future B"). Depending on the program, a solution could be to spawn a task to consume each child stream, as so:

let mut partitioned_stream =
  futures::stream::iter(0..=255)
    .partition_by_hash(|&i| i % 10);
while let Some(partition) = partitioned_stream
.next()
.await {
partition_futs
.push(tokio::spawn(
async move {
partition
.collect::<Vec<_>>()
.await
}
));
}

It's an easy mistake to make, and today, the Rust compiler can't surface the issue. While novel to people in the JS/erlang/Go world, this kind of deadlock isn't new or specific to Rust's concept of futures. It is similar to a common class of deadlock in many Java applications, where future A is being executed and blocks until future B completes, preventing any other future (including B) from being run and blocking the thread. If many A-needs-B futures are ran at once, a time may arrive where all threads of the executor are filled with future As, preventing any future Bs from being processed and causing the program to hang. These deadlocks are not easy to detect statically, and present a non-trivial risk to async programmers in Rust and Java.

"await A?? NO POLL B!! ONLY AWAIT A" (image source)

Marigold is not prone to this kind of deadlock. In Marigold, programs are composed as sequences of operations over Rust streams. A stream in Rust is "[an object] that can asynchronously produce a sequence of values" (from the futures::stream docs). Marigold allows streams to be mapped, or filtered, or consumed by one or (unlike normal Rust streams) multiple child streams. Eventually, every sequence terminates in some kind of output: a stream may write its serialized output to a file, or return its output to a parent Rust program.

In Marigold, every stream in a program is started at the same time, and is run at least concurrently and generally in parallel. As the stream abstraction enforces that stream operations occurs sequentially for a given datum, and as all streams are being run at least concurrently, it is not possible to represent a program in pure Marigold that would produce a deadlock from a future A awaiting an unpolled future B. The grammar protects the programmer from this class of deadlock, and it's much easier to "be careful" (and test) deadlock-prone abstractions like partition_by_hash once within Marigold's internal implementation than to expose the potential for deadlock to every client application.

Backpressure Isn't Just for Constraining Memory Usage

Each Marigold stream terminates in some kind of output, which provides backpressure. The slowest output path determines the backpressure for each stream. Backpressure is a fundamental concept in contemporary software engineering, and is often talked about as a way to manage system resources: without backpressure, streaming services are susceptible to performance degradation or out-of-memory errors, as the machine's memory is filled with partially processed data. And Marigold's memory usage is generally quite stable because of backpressure.²

But, backpressure is not just a "brake" to avoid running out of memory. It can also be a way to offload the work of correctly sequencing complex operations from the programmer to the asynchronous runtime. Backpressure allows us to design a programming language, like Marigold, that removes an insidious class of deadlock from asychronous programming in Rust, while still benefitting from Rust's performance and safety, and while being entirely interoperable with Rust programs.

I'm quite partial to domain specific languages (DSLs), and I think this class of deadlock is a great example of why DSLs that compile to Rust are useful complement to pure Rust: they can provide specific grammars and abstractions that enforce essential invariants. In the same way that Rust's type system prevents accidentally unsafe operations, DSLs can enforce specific coding patterns, remove classes of errors, and make it easy to solve specific, complex problems, either in standalone programs or within a broader Rust application. Critically, DSLs can achieve this while allowing programmers to write less code, and to write code that is easier to read.

Of course, you don't have to just await futures in Rust. You can poll an individual future instead of awaiting it, or use functions like futures::future::join_all to concurrently poll multiple futures.

Rust also supports greenthreading through tasks: the difference between Rust's async environment and Erlang/Go's is that greenthreading is one way to process Rust futures, while Erlang/Go provide greenthreads as their primary abstraction for asynchronous computation.

While I'm a big fan of greenthreads, there are a lot of really compelling reasons to expose futures. It gives Rust programmers good ergonomics for passing data to and from asynchronous blocks of computation, and is built to allow highly performant code. The foundation of async programming in Rust is the desire to make futures a zero-cost abstraction over bespoke callback-based synchronous code: to provide a relatively clear, simple API over the complex domain of asynchronous programming without performance penalties.

With exceptions: Marigold exposes convenience APIs for some operations that require additional memory usage, such as permutations and combinations. For these stream methods, both the memory usage and computational complexity is dependent on the size of the input.

The goal of Marigold is not to prevent superlinear operations– this would prevent the language from being useful in the domain of scientific computing. The programmer should instead ideally be made aware of the computational and memory complexity of the application they are writing.

In the long-term, the memory and computational requirements of each stream method can be expressed within Marigold's type system. Marigold is designed so that, while the exact memory and CPU requirements of each operation are unknown, they can be consistenty approximated and represented in big-O notation. Big-O notation for each operation can be combined symbolically to describe the complexity of each component stream as well as the entire Marigold program.

The eventual goal of representing complexity symbolically in the type system would be for two primary use-cases: the first would be a static analysis tool that could compare the relative complexity of two versions of the same program (integrated with version control), and the second would be a syntax highlighter that could display key information about the complexity of Marigold programs in an IDE for programmers.