Slow producer when running kafka consumer and producer from same JVM

2.2k views Asked by At

I am using kafka 0.8 and spring-integration-kafka 1.2.0.RELEASE

I have 2 topics named primary and secondary. I need to consume from primary topic and after some processing need to produce to secondary topic for next set of processing to be done later.

While consumption from primary topic works fine, producing to secondary topic starts failing after few minutes. Problem starts with send request to kafka timeout after 500 ms which I have set. Ending with thread pool being exhausted.

If I am trying to produce events to secondary topic to another kafka cluster, it works without any problem.

I have 4 consumers running with both topics having 200 partitions each.

I am little new to kafka, Please excuse for any lack of knowledge. Please comment for any missing information I should be providing.

2

There are 2 answers

0
Vaibhav On BEST ANSWER

Finally found the problem after trying out all the configurations possible.

By mistake forgot to remove below dependency which was earlier added for consumer integration.

<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>0.9.0.0</version>

It was causing some conflict while producing which was adding threads in wait state. If anybody can guide on what conflict it can add will be a good learning.

Thanks.

3
Oliver Dain On

It's a little hard to know with the information provided but I suspect that that issue is that you can consume from and then compute the result from the first topic faster than you can produce to the secondary topic. There could be lots of reasons why this might happen. For example, perhaps the writes to the secondary topic aren't as well distributed across partitions. Similarly, producing to a different cluster might succeed for a variety of reasons including faster machines, more machines, better networking, etc.

The basic issue isn't really Kafka specific: if you're consuming from one source and sending that data to a 2nd sink you often can't assume the 2nd sink will always be faster than first source. Whenever the 2nd sink is slower, even by a tiny bit, you'll eventually hit a problem like this. For example, say you can read 100 events/second from the primary but the secondary sink can only consume 99 events/second. That means each second you end up with 1 more event in memory waiting to be sent to your sink. If you don't do anything to slow down the rate at which you read from the primary source you'll run out of RAM, threads, or some other resource.

The general solution is some kind of throttling. For example, you might use a Semaphore that starts with 500 permits: that'd mean you could never read more than 500 items from the primary source that you haven't yet successfully sent to the sink. Before reading an item from the primary source you'd decrement the Semaphore so that if you're already "ahead of" the secondary by 500 items your reader will block. Each time you successfully send an item to your secondary topic you release a permit allowing another read to proceed.

I would caution against fixes like using a 2nd Kafka cluster or something else that works but doesn't really address the core problem. For example, if producing to a different cluster works now, it won't when that cluster slows down due to the loss of a node, a big rebalance, etc. That'd just hide the issue temporarily.