Is it possible to use WebHDFS with Flume?

212 views Asked by At

I would like to have the flume agent sitting outside of a hadoop cluster, and want to know if it is possible to use flume to send messages into the hadoop cluster using WebHDFS.

If not, are there alternatives to using WebHDFS? Using a multi-tiered layer of flume would still require me to have flume agents running inside a hadoop cluster.

1

There are 1 answers

0
soaptree On

flume agents can run on their own machines without being inside a hadoop cluster, as long as you specify "hdfs" as their type.

I have a flume agent writing avro events to a hdfs sink, without being on a hadoop cluster or using WebHDFS.

Here are its settings:

agent.sinks.sink1.channel = channel1
agent.sinks.sink1.type = hdfs
agent.sinks.sink1.hdfs.path = hdfs://hadoopd1.x.y.z/day/id/
agent.sinks.sink1.hdfs.rollInterval = 300
agent.sinks.sink1.hdfs.fileType = DataStream
agent.sinks.sink1.hdfs.writeFormat=Text
agent.sinks.sink1.hdfs.fileSuffix=.avro
agent.sinks.sink1.serializer=avro_event
agent.sinks.sink1.serializer = org.apache.flume.sink.hdfs.AvroEventSerializer$Builder