Turbolift: Is This a Good Idea?

by Dominic Burkart
October 22nd, 2020

Recently, I've been working on a framework to distribute code in Rust. It's called Turbolift, and it's designed to extract functions and run them as microservices. I started writing it because I wanted to write barely distributed code– code that generally doesn't need to be distributed, but has a few bottlenecks that require a lot of CPU.

The usual instinct on writing new distribution libraries should probably be concern, trepidation, and dread. A lot of really smart people have devoted a lot of time thinking about good interfaces for distributed systems and good internal abstractions, and most infra engineers are still unhappy with the current state of affairs. So, as I started thinking about Turbolift, I was reticent to start a project that would be cursed by massive domain complexity and a community rightfully wary of magic interfaces.

A child looks on at a blinding light in a dark forest.

Writing a distribution library in the dead of night.

But, I still felt that there was a distinct need in the distributed ecosystem. Most of the popular frameworks like Hadoop and Kubernetes are specialized for a very specific kind of "big data:" high-volume, high-availability, real-time data processing and storage that is or fundamentally acts like a server.

I wanted to make a simpler interface for these kinds of server-oriented systems, one for users that don't have to worry about high-availability designs or complex custom architectures. Similarly to how Rayon provides a dead-simple interface for some specific kinds of parallelization, I felt there was an opportunity to make a dead-simple default distribution interface.

In other words, I think that there are a lot of programs that could be barely distributed, and that it's easy to think of smart default configurations in each distribution platform that would work for a whole lot of applications. From scientific research to frame rendering, there are plenty of important problem spaces that have very obviously parallel tasks, and which all have similar distribution needs.

Consider an example: you are writing a text processing program in Rust, and find that a single step (applying intense compression to a bunch of small strings) is taking up the vast majority of your CPU usage. Since your application is constrained by CPU usage and you have a fast network, it makes sense to distribute the work to other devices.

To use Kubernetes, distribution will require refactoring the compression function into its own project, defining a Pod with an associated Docker Container (so that the code and environment are defined), that will likely be controlled by a Deployment (so that the cluster can automatically handle if a node fails and to handle changes in the number of requested Pods), which will then require a Service (to provide a single IP address that will direct to an available Pod in the Deployment) and an autoscale daemon (to monitor the Deployment and requisition duplicate Pods if the Pods receive enough traffic to use more than 80% of their allocated CPU time, or to remove excess Pods if the Service is being underutilized). Then, the application needs to be refactored so that every time it finds a string that needs to be compressed, it send a request to the static IP for the Service and awaits the response.

That's a lot of stuff to do every time that you want to distribute a single function! After you decide to write a microservice, it's also pretty mechanical work with little variance, work that could be done automatically without forcing humans to write code. Programmers shouldn't need to restructure their projects, and add microservice boilerplate, or (arguably) even know about Pods or Deployments in Kubernetes, for these kinds of simple cases. Knowing that a Function can run locally or be distributed on a Distribution Platform is an acceptable level of abstraction for plenty of cases, and is a lot simpler than breaking apart the Platform internals and dealing with the five Kubernetes structures described above. Aside from simplicity, a platform-agnostic distribution interface makes it much easier to switch between platforms, or to "de-distribute" the program (so that the program can be run locally in development, or so that distribution is an optional compilation feature of the application). A scientist or game developer shouldn't have to be a distribution expert in order to write a basic microservice for a computation-heavy part of their code.

So, Turbolift is designed to provide a default microservice solution for these easy distribution cases. It acts as an interface for distribution platforms like Kubernetes, and is designed to be extensible to totally different frameworks (think AWS Lambda or a cluster scheduler like SLURM), making switching between different distribution architectures easy. It's not rocket science, and it's not an abstraction that makes sense everywhere. But, it does make easy distribution problems much, much easier to distribute.

I think that there is a need for something like Turbolift, and I've been enjoying working on it. As it starts to shape up, I'm excited to see if other people find it useful as well!

☄︎