Does anyone know how to read gzip file(gzip in thr spoolSourceDirectory) in Flume process?

560 views Asked by At

If we want to get data from spoolDir which contains Gzip file in it, what should I change for the source in the Flume process? Just have a customized EventDeserializer or also need new source type(eg., a customized GzipSpoolDirectorySource instead of the default spooldir) for the flume process?

1

There are 1 answers

0
Erik Schmiegelow On

OK, so if you don't want to unpack your GZIP files at Flume level, that#s actually quite easy. You can configure your Spool Dir source to use a BlobDeserializer:

https://flume.apache.org/FlumeUserGuide.html#event-deserializers

This will parse the entire file as one event and spool that. If you want to store that to HDFS for instacne, make sure that you activate the fileHeader property on your spool dir source. You can then use the %{file} variable in your path, which effectively allows you to use flume as a one to one file copy mechanism.