We're running a web application which itself consists of several micro services, and for each request we need to call a 3rd-party service finally, which is time-consuming and typically costs couple seconds. We have a requirement that need to trace all processing logs among all services for each request, using a request traceId.
In current implementation we're using thread-based concurrency model, a thread is assigned to handle a request from beginning to the end in each service, and blocked when waiting remote service's response. It's very natural to put the traceId into ThreadLocal so that we can get it back whenever/wherever we need it.
But the thread-based concurrency model doesn't scale well, we tend to change to a NIO/Event-driven model and tried Netty with a very big performance improvement. But different phases for each request processing might be handled by different threads with Netty, making the logs' tracing very tricky.
Our current considerations include:
- Pass traceId as method parameter, it's already in request anyway, But it's very un-convenient if a deep-nested method needs it.
- Set traceId into ThreadLocal at the beginning of every callbacks. But personally I believe this approach is error-prone and could potentially introduces hard-to-find race-condition bugs.
So what's the sophisticated/elegant way to resolve such a tracing problem in NIO/Event-driven model?
This is the achilles' heel of all those Java EE frameworks out there trying to adapt to an async world (and why the existing ones never really will) - decades of storing state in ThreadLocals.
Basically, you need to tie the state you want to pass around to the channel or request you're processing, so that it's available to whatever code gets it next - and you cannot assume that will happen on the same thread.
Two ways to solve it:
Using ThreadLocals for this stuff only seems natural because our industry took a decade or two detour into making programs model I/O in ways that have nothing to do with what the computer is actually doing - though that sold a lot of hardware (async is far more like the interrupt handlers I was writing in 1983 or so) :-)