How to Implement Tracking IDs for Rails Logs in Karafka Server?

66 views Asked by At

I'm developing a Rails application where I'm utilizing the Karafka gem alongside the Rails server. To enhance traceability, I've successfully configured Rails to include a tracking ID for incoming HTTP requests by adding the following code snippet to each config/environment/{env}.rb file:

config.log_tags = [ :request_id ]

However, I'm encountering challenges in implementing a similar tracking ID mechanism for logs within the Karafka server. Is there a middleware solution or another approach that would allow me to generate a UUID and utilize it as a tracking ID for all logs in the Karafka server? There are same codes and logs which I reuse in Kafka consumers and while handling incoming HTTP requests. Any insights or code examples would be greatly appreciated. Thank you!

Below is my karafka.rb file:

class KarafkaApp < Karafka::App
  kafka_config = Settings.kafka
  max_payload_size = 7_000_000 # Setting max payload size as 7MB

  setup do |config|
    config.kafka = {
      'bootstrap.servers': ENV['KAFKA_BROKERS_SCRAM'],
      'max.poll.interval.ms': 1200000
    }
    config.client_id = kafka_config['client_id']
    config.concurrency = 4
    # Recreate consumers with each batch. This will allow Rails code reload to work in the
    # development mode. Otherwise Karafka process would not be aware of code changes
    config.consumer_persistence = !Rails.env.development?

    config.producer = ::WaterDrop::Producer.new do |producer_config|
      # Use all the settings already defined for consumer by default
      producer_config.kafka = ::Karafka::Setup::AttributesMap.producer(config.kafka.dup)

      # Alter things you want to alter
      producer_config.max_payload_size = max_payload_size
      producer_config.kafka[:'message.max.bytes'] = max_payload_size
    end
  end

  routes.draw do
    topic some_topic_1.to_sym do
      consumer SomeTopic1Consumer
    end

    ...
  end
end

I'm using Rails v7.0.5 and Karafka v2.1.11

1

There are 1 answers

0
Maciej Mensfeld On

Karafka author here. Thank you for using my OSS. Karafka and Puma/Rails are different by nature. With Rails, you have atomic requests to which you can easily assign a single trace ID and use it down the road.

This is not the same with Karafka. Karafka is an event-streaming / batch-processing framework and operates under different assumptions. There is no single "entrypoint" per message. You can implement tracing similar to the one that is already in the community-supported DataDog integration:

https://karafka.io/docs/Monitoring-and-Logging/#tracing-consumers-using-datadog-logger-listener

It operates from the "job" level perspective to report extensive info to DD.

Is there a middleware solution or another approach that would allow me to generate a UUID and utilize it as a tracking ID for all logs in the Karafka server?

To be precise, the answer is "there never will be" because part of Karafka logs are related to multiple background threads, and they in any way are not related or influenced (directly) by the incoming data. There are also few more threads for dispatching, housekeeping, etc.

Speaking about tracing, I guess your goal would also be to trace the dispatches, right? If that is so, WaterDrop has a labeling API for tracing dispatches: https://karafka.io/docs/WaterDrop-Usage/#labeling

Karafka has all the APIs needed to provide data-originating tracing for both consumption and production of messages but the scope of operations has to be exact and narrowed.

The best to start is to look into Karafka Monitoring API https://karafka.io/docs/Monitoring-and-Logging/ and write your own listener that could inject needed context prior to the processing so your logger can re-use this data.

P.S. As I recall, there are a few users who were doing this type of work that are available on Karafka Slack: https://slack.karafka.io/. It may be worth checking there.