I have the following problem:
I need to build a high-performing, multi-threaded HTTP server that can process large amounts of data with very low latency.
The overall data set is very very large (10+ GB) but most requests will only require access to a subset of that data. Waiting for DB access will be too slow, the data must be held in memory.
Each web request will only be performing read-operations on the data, however there will be a background worker thread that is responsible for managing updates to the data periodically.
My basic approach:
I've chosen actix web server as it has a good feature set and seems to perform best on the benchmarking I've looked at.
The main idea I have is to load all the data on boot into some shared state, into a data structure that is heavily optimised for the read operations.
Then I want to provide some kind of interface that each request handler can use to query that data and get immutable references to different parts of it depending on what it needs.
This should avoid race-conditions (as there is only the worker-thread that has write-access) as well as avoiding expensive data copying operations.
Architecture A
My original approach was to create this data inside a module:
static mut DATA: ProgramData;
Then expose public methods for accessing it, but after reading enough warnings about static memory I have abandoned that approach.
Architecture B
This is what I have currently working. I create an empty struct like this (where ProgramData is a custom struct) in the program's main function:
struct ProgramDataWrapper {
data_loaded: bool,
data: ProgramData,
}
Then I pass an Arc<RwLock> smart pointer to a DataService (which is responsible for async loading it, and managing data refreshes over time), and another copy of the Arc pointer is the Actix web state.
So this data should persist through the lifetime of the program because the main method always has a reference to it and it should never be dropped.
I have implemented public methods on this struct to enable querying the data and to get back different parts of it depending on the input parameters to the HTTP Request.
I then pass an Arc<RwLock> into the Actix web state so that every handler has read-only access to it and can query the data using the public functions (the internal data is not public).
The route handler does this by dereferencing the Arc then obtaining a read lock from RwLock, then calling some method like is_ready().
So then, for example, I have an endpoint /ready that will return true/false to the load-balancer to communicate the data is in memory and this instance is ready to start receiving requests.
I've noticed though that when the worker thread gets a write-lock on the data structure that no other route handlers can access it as they are blocked and the entire application freezes until the data is updated. This is because the entire ProgramDataWrapper struct is locked, including its public methods.
I think I could get around this by putting the RwLock on the ProgramData object itself, so that while the worker thread is assembling the new data other parts of the data can still get read locks on the ProgramDataWrapper objects and access the public interface.
Then it should be a short amount of time, once the data is ready, to get a write-lock on the data and only copy in the new bits of data then release it immediately.
Architecture C
The other idea I had is to use mpsc channels.
When I create the DataService, it can create a send-receive pair, keep the recv end and pass back the send half to the main method. This can then clone the send channel into the Actix web state, so that every route handler has a way to send data to the Data Service.
What I was thinking then, is to create a struct like this:
TwoWayData<T, U> {
query: T,
callback: std::sync::misc::Sender<U>,
}
Then in the route handler, I can create a Send-Receive pair of the above type.
I can send a message to the data service (because I have access to a pointer to the clone of the sender from the top main function), and include as that payload the object to send data back to the route handler.
Something like:
#[get("/stuff")]
pub async fn data_ready(data: web::Data<Arc<Sender<TwoWayData<DataQuery, DataResponse>>>>) -> impl Responder {
let (sx, rx): (Sender<TwoWayData<DataQuery, DataResponse>>, Receiver<TwoWayData<DataQuery, DataResponse>>) = channel();
data.send(TwoWayData {
query: "Get me some data",
callback: sx.clone(),
});
}
Then the data service can just listen to incoming messages, extract the query and process it, and send the result back down the channel it has just received.
My Question
If you're still with me, I really appreciate that.
My questions are this:
Is there a large overhead to the mspc channel that will slow down my program communicating large amounts of data over mspc channels?
Is it even possible to send the callback in the way I want to allow two-way communication? And if not, what is the accepted way of doing this?
I know this is a matter of opinion, but which of these two approaches is a more standard way of solving this type of problem, or does it just come down to personal preference / the technical requirements of the problem?
A. I would disregard
static mut
entirely since it isunsafe
and easy to get wrong. The only way I would consider it is asstatic DATA: RwLock<ProgramData>
, but then it is the same as option B except it is less flexible to testing, discrete data sets, etc.B. Using an
Arc<RwLock>
is a very common and understandable pattern and I would consider it my first option when sharing mutable data across threads. It is also a very performant option if you keep your writing critical section small. You may reach for some other concurrent data-structure if its infeasible to clone the whole dataset for each update and in-place updates are long and/or non-trivial. At 10+ GB of data, I'd have to take a good look at your data, access, and update patterns to decide on a "best" course of action. Perhaps you can use many smaller locks within your structure, or use a DashMap, or combination thereof. There are many tools available and you may need to make something custom if you're striving for the lowest latency.C. This looks a bit convoluted but glossing over the specifics is pretty much an "actor model", or at least based on the principles of message passing. If you wanted the data to behave as a separate "service" that can govern itself and provides more control over how the queries are processed then you can use an actor framework like Actix (originally built for Actix-Web but they've since drifted apart enough that there's no longer any meaningful relation). I personally don't use actors since they tend to be an obscuring layer of abstraction, but its up to you. It will likely be slower than accessing the data directly and you'll still need to internally decide on a concurrency mechanism as mentioned above.