What should be flume.conf parametres for save tweets to single FlumeData file per hour?

146 views Asked by At

We are saving tweets in a directory order like /user/flume/2016/06/28/13/FlumeData... .But each hour it creates more than 100 FlumeData file.I have changed TwitterAgent.sinks.HDFS.hdfs.rollSize = 52428800 (50 mb) same thing happened again.After that I tried with changing rollcount parametre too but didnt work.How can i set parametres to get one FlumeData file per hour.

3

There are 3 answers

0
mgurcan On BEST ANSWER

I resolved this problem with setting rollInterval=3600 rollcount=0 and batchSize=100 flume.conf parametres as @vkgade suggest

13
ViKiG On

What about rollInterval? Did you set it zero. If it is, then the issue might be something else. If the rollInterval is set to some value, it will kind of override the rollSize and rollCount values. The file rotation might happen before the file size reaches the rollSize value. Also, check the HDFS block size you set. If it is set to, too small value even that might cause the file rolling.

Try this -

    TwitterAgent.sinks.HDFS.channel = MemChannel
    TwitterAgent.sinks.HDFS.type = hdfs
    TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hpc01:8020/user/flume/tweets/%Y/%m/%d/%H
    TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
    TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text

    TwitterAgent.sinks.HDFS.hdfs.batchSize = 100


    TwitterAgent.sinks.HDFS.hdfs.rollSize = 0

    TwitterAgent.sinks.HDFS.hdfs.rollCount = 0

    TwitterAgent.sinks.HDFS.hdfs.rollInterval = 3600
    TwitterAgent.channels.MemChannel.type = memory
    TwitterAgent.channels.MemChannel.capacity = 1000

    TwitterAgent.channels.MemChannel.transactionCapacity = 100
1
mgurcan On
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hpc01:8020/user/flume/tweets/%Y/%m/%d/%H
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text

TwitterAgent.sinks.HDFS.hdfs.batchSize = 1


TwitterAgent.sinks.HDFS.hdfs.rollSize = 0

TwitterAgent.sinks.HDFS.hdfs.rollCount = 10

TwitterAgent.sinks.HDFS.hdfs.rollInterval = 0
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000

TwitterAgent.channels.MemChannel.transactionCapacity = 1000