I installed and configured confluent kafka. And kafka is running with a 1GB heap size.
export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G" #from /bin/kafka-server-start
I created a topic “thing-data” with only one partition and using an automated job to pump some data into this topic every 5 seconds. And every message is around 2400 bytes in size.
What I see is the smallest offset of my topic is changing too frequently. That means kafka queue is able to hold very few records at a given point in time. I had a look at the topic message log files sizes in /var/log/kafka/thing-data-0/
[hduser@laptop thing-data-0]$ ll
-rw-r--r--. 1 confluent confluent 10485760 Dec 30 17:05 00000000000000148868.index
-rw-r--r--. 1 confluent confluent 119350 Dec 30 17:05 00000000000000148868.log
[hduser@laptop thing-data-0]$ ll
-rw-r--r--. 1 confluent confluent 10485760 Dec 30 17:08 00000000000000148928.index
-rw-r--r--. 1 confluent confluent 54901 Dec 30 17:08 00000000000000148928.log
[hduser@laptop thing-data-0]$ ll
-rw-r--r--. 1 confluent confluent 10485760 Dec 30 17:12 00000000000000148988.index
-rw-r--r--. 1 confluent confluent 38192 Dec 30 17:13 00000000000000148988.log
As you can see the log files rolls over very frequently. Each time old files are marked as .deleted and getting deleted after the configured time.
Below are the configuration settings related to logs from /etc/kafka/server.properties.
log.roll.hours=168
log.retention.hours=168 #i tried with log.retention.ms as well .. :-)
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
When I restart the kafka the files looks like below.
-rw-r--r--. 1 confluent confluent 10485760 Dec 30 17:21 00000000000000149099.index
-rw-r--r--. 1 confluent confluent 0 Dec 30 17:21 00000000000000149099.log
I suspect something with the .index file size because it is set to the maximum ( segment.index.bytes default value is 10485760). (I suspect this because kafka cluster was working fine for almost a month)
Not sure what is going wrong for this and any help will be appreciated.
Some of the reference I have made given below.
http://kafka.apache.org/documentation/
https://stackoverflow.com/questions/28586008/delete-message-after-consuming-it-in-kafka
Did you check for
log.roll.ms
—This is the primary configuration. By default, it doesn't have any value. But if present it will overridelog.roll.hours
.