Offsets for Kafka Direct Approach in Spark 1.3.1

322 views Asked by At

I am implementing the 'direct' approach for kafka streaming in Spark 1.3.1 https://spark.apache.org/docs/1.3.1/streaming-kafka-integration.html As I understand it, there are two ways that the 'auto.offset.reset' can be set: "smallest", and "largest". The behavior that I am observing (and let me know if this is to be expected) is that the "largest" will start fresh and receive any new incoming data - while the "smallest" will start from 0 and read to the end, but won't receive any new incoming data. Clearly it would be preferable to be able to start from the beginning and also receive new incoming data. I did see the access (in the docs) to the offsets that each batch is consuming, but I'm not sure how that could be helpful here. Thanks.

1

There are 1 answers

0
joebuild On BEST ANSWER

It looks like I was mistaken - the 'smallest' actually does continue to read from the end for new/incoming data.