Some of the HDFS sink files are not closed

Some say that if the sink process fails with problems such as Timeout condition, it doesn't try to close the file again.

I have been look into my flume log file, but there are no error. However, the log file shows that every cycle, flume makes two tmp files and close only one tmp file...

Any suggestion for the config would be appreciated! Thanks!

#Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

#Configure the Kafka Source
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 1000
#a1.sources.r1.batchDurationMillis = 2000
a1.sources.r1.kafka.bootstrap.servers = 150.2.237.16:6667,150.2.237.17:6667
a1.sources.r1.kafka.topics = 1-sysmaster1-thread
a1.sources.r1.kafka.consumer.group.id = flume_hdfs_consumer

#Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /user/flume/kafka-data/1-sysmaster1-thread/%y%m%d
a1.sinks.k1.hdfs.filePrefix = 1-sysmaster1-thread-%H%M

#Describing sink with the problem of Encoding
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text

#Describing sink with the problem of many hdfs files
### Roll a file after certain amount of events occurs  ###
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 10000
a1.sinks.k1.hdfs.batchSize = 1000

#Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 1000

#Use File channel
#a1.channels.c1.type = file
#a1.channels.cl.checkpointDir = /home/bigdata/flume/checkpoint
#a1.channels.c1.dataDirs = /home/bigdata/flume/data

#Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

23 4월 2019 11:47:04,105 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:246)  - Creating /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1147.1555987622865.tmp
23 4월 2019 11:48:03,382 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSDataStream.configure:57)  - Serializer = TEXT, UseRawLocalFileSystem = false
23 4월 2019 11:48:03,457 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:246)  - Creating /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1148.1555987683383.tmp
23 4월 2019 11:48:08,664 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.doClose:438)  - Closing /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1148.1555987683383.tmp
23 4월 2019 11:48:08,689 INFO  [hdfs-k1-call-runner-8] (org.apache.flume.sink.hdfs.BucketWriter$7.call:681)  - Renaming /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1148.1555987683383.tmp to /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1148.1555987683383
23 4월 2019 11:48:08,712 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:246)  - Creating /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1148.1555987683384.tmp
23 4월 2019 11:49:03,711 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSDataStream.configure:57)  - Serializer = TEXT, UseRawLocalFileSystem = false
23 4월 2019 11:49:03,806 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:246)  - Creating /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1149.1555987743712.tmp
23 4월 2019 11:49:05,439 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.doClose:438)  - Closing /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1149.1555987743712.tmp
23 4월 2019 11:49:05,460 INFO  [hdfs-k1-call-runner-5] (org.apache.flume.sink.hdfs.BucketWriter$7.call:681)  - Renaming /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1149.1555987743712.tmp to /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1149.1555987743712
23 4월 2019 11:49:05,480 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:246)  - Creating /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1149.1555987743713.tmp
23 4월 2019 11:50:02,354 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSDataStream.configure:57)  - Serializer = TEXT, UseRawLocalFileSystem = false
23 4월 2019 11:50:02,387 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:246)  - Creating /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1150.1555987802355.tmp
23 4월 2019 11:50:03,015 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.doClose:438)  - Closing /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1150.1555987802355.tmp
23 4월 2019 11:50:03,032 INFO  [hdfs-k1-call-runner-4] (org.apache.flume.sink.hdfs.BucketWriter$7.call:681)  - Renaming /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1150.1555987802355.tmp to /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1150.1555987802355

[[email protected] logs]# hdfs dfs -ls /user/flume/kafka-data/1-sysmaster1-thread/190423/
Found 163 items
-rw-r--r--   3 root hdfs    1781109 2019-04-23 11:20 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1120.1555986001199
-rw-r--r--   3 root hdfs     212118 2019-04-23 11:20 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1120.1555986001200.tmp
-rw-r--r--   3 root hdfs    1777270 2019-04-23 11:21 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1121.1555986062575
-rw-r--r--   3 root hdfs      54451 2019-04-23 11:21 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1121.1555986062576.tmp
-rw-r--r--   3 root hdfs    1781741 2019-04-23 11:22 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1122.1555986123181
-rw-r--r--   3 root hdfs      34735 2019-04-23 11:22 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1122.1555986123182.tmp
-rw-r--r--   3 root hdfs    1782315 2019-04-23 11:23 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1123.1555986183768
-rw-r--r--   3 root hdfs      28682 2019-04-23 11:23 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1123.1555986183769.tmp
-rw-r--r--   3 root hdfs    1782437 2019-04-23 11:24 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1124.1555986244304
-rw-r--r--   3 root hdfs     211547 2019-04-23 11:24 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1124.1555986244305.tmp
-rw-r--r--   3 root hdfs    1782775 2019-04-23 11:25 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1125.1555986302891
-rw-r--r--   3 root hdfs      35918 2019-04-23 11:25 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1125.1555986302892.tmp
-rw-r--r--   3 root hdfs    1781180 2019-04-23 11:26 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1126.1555986362097
-rw-r--r--   3 root hdfs      30967 2019-04-23 11:26 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1126.1555986362098.tmp
-rw-r--r--   3 root hdfs    1781682 2019-04-23 11:27 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1127.1555986423432
-rw-r--r--   3 root hdfs      41381 2019-04-23 11:27 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1127.1555986423433.tmp
-rw-r--r--   3 root hdfs    1781710 2019-04-23 11:28 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1128.1555986483928
-rw-r--r--   3 root hdfs     211240 2019-04-23 11:28 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1128.1555986483929.tmp
-rw-r--r--   3 root hdfs    1785456 2019-04-23 11:29 /user/flume/kafka-data/1-sysmaster1-thread/190423/1-sysmaster1-thread-1129.1555986542442

1 Answers

0
user3473222 On

I found the problem...

I set the config to roll the file with the prefix of time

a1.sinks.k1.hdfs.filePrefix = 1-sysmaster1-thread-%H%M

As you can see the result path, Every file made at the end of minute, it cannot be rolled up properly.

After I removed the line from the config file, It works fine.