Why Turbolift

Recently, I've been working on a distribution interface in Rust. It's called Turbolift, and it's designed to extract functions and run them as microservices. I started writing it because I felt like there was a place to make a much nicer default for simple distribution cases (embarassingly parallel tasks that don't require storage). In this post, I'll discuss why I think something like Turbolift should exist, and talk about Turbolift's design.

With Turbolift, I wanted to encode this kind of "simple" distribution into the source code for an application by leveraging the power of Rust's procedural macros. Procedural macros are a powerful metaprogramming tool that allow you to stop Rust's parser and execute arbitrary code at compile time. In this case, the goal was to tag a specific function with a procedural macro, and then to extract the source code for that function and its dependencies automatically.

What does the code look like?

Let's look at an example. To start, here is an example of the complete source code for a non-distributed program that finds the squares of a few random numbers:

#[macro_use(c)]
extern crate cute;
use rand::{thread_rng, Rng};

fn square(u: u64) -> u64 {
    u * u
}

fn random_numbers() -> Vec<u64> {
    let mut pseud = thread_rng();
    c![pseud.gen_range(0, 1000), for _i in 1..10]
}

fn main() {
    let input = random_numbers();
    let output = c![square(*int), for int in &input];
    println!(
      "computation complete.\ninput: {:?}\noutput: {:?}",
      input, output
    );
    if output != c![x*x, for x in input] {
      std::process::exit(1)
    }
}

#[macro_use]
extern crate lazy_static;
#[macro_use(c)]
extern crate cute;
use futures::future::try_join_all;
use rand::{thread_rng, Rng};
use tokio::sync::Mutex;

use turbolift::kubernetes::K8s;
use turbolift::on;

lazy_static! {
    static ref K8S: Mutex<K8s> = Mutex::new(
      K8s::with_max_replicas(2)
    );
}

#[on(K8S)]
fn square(u: u64) -> u64 {
    u * u
}

fn random_numbers() -> Vec<u64> {
    let mut pseud = thread_rng();
    c![pseud.gen_range(0, 1000), for _i in 1..10]
}

fn main() {
    let input = random_numbers();
    let futures = c![square(*int), for int in &input];
    let mut rt = tokio::runtime::Runtime::new().unwrap();
    let output = rt.block_on(
      try_join_all(futures)
    ).unwrap();
    println!(
      "computation complete.\ninput: {:?}\noutput: {:?}",
      input, output
    );
    if output != c![x*x, for x in input] {
      std::process::exit(1)
    }
}

With 17 additional lines of code, we have declared "K8S", a global state that will handle interaction with the cluster, we have added a macro tag to the square function, and we have added an asynchronous runtime environment.

What is Turbolift actually doing? 🧐

Good question! At compile time, when the parser hits the procedural macro "#[on(K8S)]", it extracts the source code for the square function, and automatically rewrites it as a microservice. The source code for the new microservice is put aside by the compiler, to be included in the final executable for the main service. In the abstract syntax tree for the main program, we insert a replacement for the square function that calls the microservice we generated. We add some extra type constraints to the replacement, letting us leverage the type check to guarantee that the input and output of the function are serializable. We also make the replacement function async, since now we're making a network call which we don't want to be blocking. This is why it was necessary to add the asynchronous runtime.

At runtime, the source code for the microservice will be extracted from the main program's executable. The code will be processed by Turbolift into a form ingestible by the cluster manager (here, we'll build an image for the microservice and tell Kubernetes how to run it as a microservice). The microservice will be programmatically deployed and, when the main program is finished with the given resources, automatically removed. Each time the function is called, Turbolift takes the input, serializes it, and sends it to the cluster; when a result is sent, Turbolift deserializes it and returns it.

This is a weird approach! Usually, orchestration technologies are separate from the code. Here, we're essentially refactoring the codebase from a monolith into a microservice architecture every time we compile.

That's way too magic! 🧙

Perhaps! To be honest, I was reticent to start work on this project. It's really hard to write code for distributed systems without knowing which vendors or cluster managers you'll be using, and the reason k8s and similar programs are written "inside out" is because magic interfaces make it difficult or impossible to tune cluster performance. If you need Kubernetes-level customizability, you need Kubernetes-level interface complexity.

The goal of Turbolift is to make it really easy to distribute scripts that have to perform the same intensive process on a lot of inputs. By adding constraints to the domain, we can use simpler abstractions. Ideally, we can leverage the compiler to verify that programs meet the requirements of this simpler abstraction, and the "magic" is paid for. In cases where Turbolift can't distribute a program, the program should ideally fail to compile, so that the programmer can decide whether to use a lower-level abstraction (like k8s) or alter their program.

Still, distribution managers are fickle. For example, while using an untested configuration of plugins for k8s, it's likely that there will be occasional bugs. While examining the generated microservice source code is possible (the source code for each microservice is stored in a cache within the project directory, so it's easy to view), it might be harder to diagnose whether issues are coming from the project, turbolift failing to work with the k8s plugins, or the k8s plugins failing to work together, simply because you might not be familiar with how turbolift has been implementing the microservice code.

But, I still felt like there was a distinct need for simpler distribution interfaces. Not every application benefits from hand-rolled, bespoke cluster configurations. I also felt that a cluster-agnostic way of describing microservice architectures could significantly mitigate tech debt and center system design within the code. Critically, I also felt that it was possible to make conceptually easily distributable applications much easier to actually distribute.

Distributing computation in a script is a very different task than distributing work in a versioned, complex, web-facing application, and it makes sense that the two tasks would use different interfaces. Simpler distribution patterns allow for simpler interfaces.

In the Rust ecosystem, there are parallelization libraries like Rayon that internalize a lot of the complexity of parallelism; why shouldn't there be an equivalent for distribution?

Why is this cool? 🆒

To see why something like Turbolift might be useful, let's look at another example. You are writing a text processing program in Rust, and find that a single step (applying intense compression to a bunch of 10KB-1MB strings) is taking up the vast majority of your CPU usage. Since your application is CPU-constrained and you have a fast network, it makes sense to distribute the work to other devices.

Let's walk through the steps to distribute this compression function on Kubernetes. You will need to refactor the compression function into its own project, and build an image of that project with Docker or similar. The cluster needs access to a private or local repository which contains the image, which requires non-trivial configuration (configuration that also varies across k8s implementations). Then we must define a Pod with an associated container (so that the code and environment are defined), as part of a Deployment (so that the cluster can automatically handle if a node fails and to handle changes in the number of requested Pods), which will then require a Service (to provide a single IP address that will direct to an available Pod in the Deployment) and an autoscale daemon (to monitor the Deployment and requisition duplicate Pods if the Pods receive enough traffic to use more than 80% of their allocated CPU time, or to remove excess Pods if the Service is being underutilized). Some kind of ingress will have to be enabled to allow external access to the service. Then, the application needs to be refactored so that every time it finds a string that needs to be compressed, it sends a request to the static IP for the Service and awaits the response.

That's really complicated for someone who just wants to speed up the string compression part of their code!

Let's think about the consequences of writing this for K8s. After a refactor, the code would:

This is a significant technical investment and likely source of tech debt. Right now, a lot of programs that could be distributed aren't, since it doesn't make sense to write and maintain distribution code. That's not great!

By creating a tool that internalizes that complexity, there is a better separation of concerns between infra development and application development. This separation of concerns is great for application developers, since they don't have to write infra code. It's also great from an infra standpoint, since it aggregates all of the orchestration logic in one place, instead of being diffused across multiple call sites across multiple projects. If a project can use Turbolift, then it has the same architecture as every other Turbolift project, and that architecture can be collectively tested and maintained in a single open-source repository.

Despite the separation of concerns, the application infrastructure remains explicit to the application developers: every microservice is clearly tagged with a macro, delineating them from local functions. In the same way that you don't have to understand how your hardware works in order to write software, you should not have to understand the mechanics of how a microservice architecture is specifically implemented in Kubernetes, Lambda, or similar in order to reason about the abstraction.

Application engineers shouldn't have to become infra engineers in order to apply basic distribution concepts; they should use the highest-level abstraction possible that allows them to effectively reason about their system. As an interface, Turbolift provides a higher level of abstraction for applications that don't need to worry about the complexity of web-facing backends.

Where doesn't Turbolift make sense?

Turbolift is designed for programming in the small. It's good when you want to make a script or small application work faster via distribution, but it's probably better to be more explicit about your microservice architecture if you're writing a larger application, and to actually split your repos / deployment pipelines.

Turbolift is designed for barely distributed programs: applications that start, do an isolated operation a bunch of times, and then complete. The important abstraction here is the idea of a microservice: if the scaling properties of a microservice system don't make sense in your context, then by extension Turbolift is not a good choice.

Some examples: if you require low, consistent latency, then the additional network calls implied by turning a function into a microservice might be an unacceptable scalability/performance tradeoff. Maybe you have a lot of tiny processes that need to be able to quickly communicate with one another and should thus stay on the same machine. Or, each operation requires a massive amount of data that you don't want to send over the wire. Perhaps your application makes and relies on global state or system side effects, which would not be shared across machines (and might be duplicated if they run when the program initializes, depending on how the microservice is extracted). In each of these cases, we are leaving the basic microservice distribution cases that Turbolift is designed for, and Turbolift would probably not be helpful. There are plenty of applications where microservice scaling properties could make sense (statistical permutation testing, frame rendering), and plenty where they probably don't (real-time data processing, agent-based modeling).

Turbolift is also currently inappropriate for high-security projects. The current K8s implementation has not been audited. For example, one notable concern is that we assume a secure network (we don't do anything to avoid snooping or man-in-the-middle attacks, it's all http requests). Also, the microservice source code is currently copied into each microservice pod (assuming the worker system is secure). For most projects, this is probably acceptable if you're running e.g. a one-off private cloud or an airgapped pi cluster, but it might not be for your specific application.

Additionally, the tradeoff of using a procedural macro to turn functions into microservices is increased compilation time. If it's essential that your application has to be able to compile quickly, then you probably shouldn't use Turbolift. Note: it's easy to turn distribution off, for example to enable distribution as a feature or to disable distribution while developing, which speeds up compilation. This can be very easily implemented as a crate feature for any application using Turbolift. See the turbolift example projects for examples of distribution-as-a-feature.

What's next? 🔜

I hope that as Turbolift matures, it will make it easier for programmers who aren't infra experts to speed up their Rust programs via distribution. Here are some things that excite me about Turbolift's future. This list isn't a set of promises (as of writing I am still working on finalizing k8s support), but they show where my project interests lie: