Designing the system used to import 16M (and counting) assets into Air
This piece was written by Madison Brown, a Senior Software Engineer at Air
At Air, a core pillar of the service we offer is the ability to effortlessly move content into our platform. As the majority of new users already have content stored in a competing product, it’s essential that we be able to integrate with a wide variety of these competitors in order to obtain that content efficiently and reliably.
In the app, this service is exposed via the “import modal”, which currently allows users to perform self-serve imports from 3 sources — Google Drive, Dropbox and Box. Behind the scenes, we’ve also integrated with 7 additional sources, migrations from which are offered to Enterprise customers as a custom service. To date, this system has successfully moved over 16M assets, totaling just over 275TB in size, into our platform.
Needless to say, managing the intricacies of each service and its corresponding API within a single, extensible interface has been challenging, and the system has undergone several revisions since its inception. In this article, I’ll present a deep dive into one particular challenge — rate limiting — and the solution we ultimately used to overcome it.
Early iterations of the import service were plagued by rate-limit errors. The reasons for this were several. Since we started by integrating with larger competitors offering generous rate limits, it was not immediately necessary to implement a robust solution for their management; however, as we began to expand in breadth, it became clear that the rate-limiting schemes we would encounter were wide-ranging and, in some cases, highly restrictive. For example, aside from differences in the rate itself, some calculate limits per application while others calculate per user, some calculate limits as a number of requests per minute while others use a daily quota, and some provide retry-after headers while others do not.
The initial approach was simply to handle rate-limit errors with an exponential backoff algorithm. While an excellent failsafe, there are a few reasons why this alone was insufficient for our use case. First, consider for example an API that imposes a daily request quota. In this scenario, relying on exponential backoff to reactively manage rate limits means that the system may become unavailable for extended periods of time should the quota be reached early in the day. Now consider that this quota may also be imposed at the application level —i.e. requests from all users may contribute to the same quota. In this case, we also encounter a fairness issue: if one user imports a large amount of content, the service may become unavailable for other users of the application.
Attempts had been made to ameliorate this issue by throttling requests to varying degrees depending on the API in question; however, this approach again proved to be insufficient. Unavoidably, requests made to a given API needed to be made from multiple processes. At that time, file downloads were handled by individual Lambda instances while the imports themselves were run on an Elastic Beanstalk server, which of course could scale to multiple nodes (we’ve since moved to containerized workloads running on ECS, but the problem remains the same). Since the number of processes making requests to a given API, even on behalf of a single user, was highly variable, using purely localized throttling to proactively manage global rate limits was ultimately impossible.
Those familiar with the implementation of rate limiting to protect an API may at this point be thinking of algorithms like token bucket; however such algorithms are actually not helpful in our scenario — since they only tell us whether a request can be made or not, they provide no more information than simply making the request and checking if it succeeds. Rather, the problem at hand can be described as follows:
"Given an action that may or may not succeed due to a known rate limit, and given an unknown number of actors attempting to perform said action, when is the optimum time for any given actor to take action, such that the global limit is never exceeded at any time resolution, nor is it severely underutilized?"
Provided this context, one apparent solution might be to implement a centralized coordinating service. For example, participants could continuously make requests to the coordinator, which would respond on a first-in-first-out basis at the appropriate rate, notifying participants of when to execute an action. Of course, the coordinator itself would also need to scale to multiple instances, thereby leaving us with the same fundamental problem: maintaining a queue of actions and executing those actions at a controlled rate across multiple processes in a distributed system.
To avoid the overhead of maintaining an independent coordinating service, we decided to take a novel approach inspired by the concept of self-organization.
Ants working together to build a bridge.
Self-organization…is a process where some form of overall order arises from local interactions between parts of an initially disordered system. The process can be spontaneous when sufficient energy is available, not needing control by any external agent. It is often triggered by seemingly random fluctuations, amplified by positive feedback. The resulting organization is wholly decentralized, distributed over all the components of the system. As such, the organization is typically robust and able to survive or self-repair substantial perturbation. [ref]
A classic example of a self-organizing system is a colony of ants, in which global order arises spontaneously via the emission and interpretation of chemical messages by each individual member. While far less complex, the algorithm we ultimately developed follows the same fundamental principle: participants emit messages (in our case, via Redis pub/sub) signaling their intent to perform an action, and simultaneously analyze the messages emitted by peer participants to determine when to emit said messages.
Provided that all nodes follow the same algorithm, the repetition of this cycle rapidly gives rise to equilibrium, with messages being emitted at a consistent rate by exactly 1 concurrent actor. Meanwhile, as more actors connect to a given pool — a group of actors sharing a rate limit — the rate of action at any single node in that pool decreases such that all participants share the global rate limit equally.
The system is not perfect; while the specified rate limit is guaranteed not to be exceeded, whenever a node joins or leaves a pool there is a short period during which the system must auto-correct and no requests will be made, meaning that the target rate limit will always be slightly under-utilized. There’s also a theoretical upper limit to the request rate it can manage, derived from the average delay in pub/sub message transmission. In our case, this upper limit falls somewhere around 200 requests per second—much greater than the rate limit imposed by any API we integrate with, though it could be a limiting factor for other use cases of the algorithm.
Finally, it’s not a replacement for proper exponential backoff and honoring of retry-after headers, as some APIs we integrate with do not advertise their rate limits at all; however, the system has proven to be a highly effective tool for ensuring that our import services are available at all times, to all users.