How to trace request process when using NIO or event-driven framework like Netty

1.4k views Asked by At

We're running a web application which itself consists of several micro services, and for each request we need to call a 3rd-party service finally, which is time-consuming and typically costs couple seconds. We have a requirement that need to trace all processing logs among all services for each request, using a request traceId.

In current implementation we're using thread-based concurrency model, a thread is assigned to handle a request from beginning to the end in each service, and blocked when waiting remote service's response. It's very natural to put the traceId into ThreadLocal so that we can get it back whenever/wherever we need it.

But the thread-based concurrency model doesn't scale well, we tend to change to a NIO/Event-driven model and tried Netty with a very big performance improvement. But different phases for each request processing might be handled by different threads with Netty, making the logs' tracing very tricky.

Our current considerations include:

  • Pass traceId as method parameter, it's already in request anyway, But it's very un-convenient if a deep-nested method needs it.
  • Set traceId into ThreadLocal at the beginning of every callbacks. But personally I believe this approach is error-prone and could potentially introduces hard-to-find race-condition bugs.

So what's the sophisticated/elegant way to resolve such a tracing problem in NIO/Event-driven model?

2

There are 2 answers

1
Tim Boudreau On BEST ANSWER

This is the achilles' heel of all those Java EE frameworks out there trying to adapt to an async world (and why the existing ones never really will) - decades of storing state in ThreadLocals.

Basically, you need to tie the state you want to pass around to the channel or request you're processing, so that it's available to whatever code gets it next - and you cannot assume that will happen on the same thread.

Two ways to solve it:

  1. Channel.attr() - if the state can be tied to the connection, which is only used for one thing at a time, then create a static AttributeKey and pass it to Channel.attr() - you'll get back an Attribute which is initially null - in your first handler, assign it to something, and everything after that can pull it out of there (make sure you clear it when you know you're done if that connection is to be reused without being closed, like an HTTP keep-alive connection).
  2. Attach it to some object you decode - subclass the decoder for HTTP requests (if HTTP is what you're doing) and create your own subclass with an ID.

Using ThreadLocals for this stuff only seems natural because our industry took a decade or two detour into making programs model I/O in ways that have nothing to do with what the computer is actually doing - though that sold a lot of hardware (async is far more like the interrupt handlers I was writing in 1983 or so) :-)

2
Frederic Brégier On

My 2 cents: if you're in the NIO/Event-driven model, then you probably have to pass the "request id" from the caller to the callee, then back to the caller (async/even-driven method). This has then nothing to do with the thread or channel Id (one channel could be reused for various queries, such that you don't pay again and again the "connect").

Then on the caller side, you can use a map or so (even a materialized one through any persistence tool) to restore the context and do what you need to do.