October 10th, 2022
This is a post about Marigold, a programming language for operating on async streams that compiles to Rust.
This class of deadlock is not unique to Rust. It is similar to a common class of deadlock in many Java applications, where future A is being executed and blocks until future B completes, preventing any other future (including B) from being run and blocking the thread. If many A-needs-B futures are ran at once, all threads of the executor can be filled with future As, preventing any future Bs from being processed and causing the program to hang. These deadlocks are not easy to detect statically, and present a non-trivial risk to async programmers in Rust and Java.
"await A?? NO POLL B!! ONLY AWAIT A" (image source)
Marigold is not prone to this kind of deadlock. In Marigold, programs are composed as sequences of operations over Rust streams. A stream in Rust is "[an object] that can asynchronously produce a sequence of values" (from the
futures::stream docs). Marigold allows streams to be mapped, or filtered, or consumed by one or (unlike normal Rust streams) multiple child streams. Eventually, every sequence terminates in some kind of output: a stream may write its serialized output to a file, or return its output to a parent Rust program.
In Marigold, every stream in a program is started at the same time, and is run at least concurrently and generally in parallel. As the stream abstraction enforces that stream operations occurs sequentially for a given datum, and as all streams are being run at least concurrently, it is not possible to represent a program in pure Marigold that would produce a deadlock from a future A awaiting an unpolled future B. The grammar protects the programmer from this class of deadlock.
As a note on how Marigold executes streams: when a Marigold program starts, the streams don't necessarily all run at once. Each stream terminates in some kind of output which provides backpressure. The slowest output path determines the backpressure for each stream. Backpressure is a fundamental concept in contemporary software engineering, and is often talked about as a way to manage system resources: without backpressure, streaming services are susceptible to performance degradation or out-of-memory errors, as the machine's memory is filled with partially processed data. It's true that Marigold's memory usage is generally quite stable because of backpressure.2
But, backpressure is not just a "brake" to avoid running out of memory. It can also be a way to offload the work of correctly sequencing complex operations from the programmer to the asynchronous runtime. Backpressure allows us to design a programming language, like Marigold, that removes an insidious class of deadlock from asychronous programming in Rust, while still benefitting from Rust's performance and safety and being entirely interoperable with Rust programs.
I'm quite partial to domain specific languages (DSLs), and I think this class of deadlock is a great example of why DSLs that compile to Rust are useful complement to pure Rust: they can provide specific grammars and abstractions that enforce essential invariants. In the same way that Rust's type system prevents accidentally unsafe operations, DSLs can enforce specific coding patterns, remove classes of errors, and make it easy to solve specific, complex problems, either in standalone programs or within a broader Rust application. Critically, DSLs can achieve this while allowing programmers to write less code, and to write code that is easier to read.