How should a Kafka HLC figure out the # of partitions for a topic?

Question

How should a Kafka HLC figure out the # of partitions for a topic?

543 views Asked by Jolly Roger At 10 June 2015 at 22:08

I'm using the kafka-node HighLevelConsumer, and am having problems where I always receive duplicate messages on startup.

In order to maintain processing sequence, my consumer simply appends messages to a work queue, and I process the events serially. I pause the consumer if I hit a queue high-water mark, I have auto-commit disabled, and I commit "manually" after my client code fully processes each event.

Despite committing, on startup, I always get the last (previously committed) message from one or more partitions (depending on how many other HLCs are running in my group). I was a little surprised that the HLC wouldn't give me (committed+1) but I decided to just "ignore" messages that had an offset earlier than the offset committed. As a quick test,

offset.fetchCommits('fnord', [{topic:'test', partition: 0}, 
                              {topic:'test', partition: 1}, 
                              {topic:'test', partition: 2}, 
                              {topic:'test', partition: 3}], ...

This works if my payload list matches the number of partitions defined. If I exceed the number of partitions, I get a [BrokerNotAvailableError: Could not find the leader] error.

Am I correct that I can't auto-commit if I want to have a stronger guarantee that I won't lose messages if my message processing is asynchronous and may fail (i.e. ETL job)? kafka-node just emits a 'message' event, there's no way to confirm that it was successfully handled.
Is it expected behavior that the HighLevelConsumer will read the message of the last committed offset (i.e. a duplicate) rather than the next offset?
What is the best way to get the number of partitions for a topic?

Original Q&A

There are 1 answers

**Jolly Roger** · Answer 1 · 2015-06-12T22:36:57+00:00

I dug into the kafka-node source, and there's an undocumented call I was able to use to get the partition info:

client.loadMetadataForTopics(['test'], function(err, results) {..}

(I don't love calling something that doesn't appear to be a documented part of the public API, and I'm uncomfortable with the rather raw-feeling mixed array nature of the returned results, but it solves my problem for the moment.)

TechQA.

How should a Kafka HLC figure out the # of partitions for a topic?

There are 1 answers

Related Questions in NODE.JS

Related Questions in APACHE-KAFKA

Popular Questions

Popular Tags

Trending Questions