pcap to Avro on Hadoop

510 views Asked by At

I need to know if there is any way where I can convert pcap file to avro , so that I can write map reduce program on avro data using hadoop ?

Otherwise what is the best practice when dealing with pcap files on hadoop ?

Thanks

1

There are 1 answers

2
AudioBubble On BEST ANSWER

A pcap file is a collection of records, each containing a time stamp, a packet length field, a "amount of data for that packet actually captured and saved" length field, and an unstructured blob of raw packet data.

The Avro documentation says:

Avro provides:

  • Rich data structures.

....

"Unstructured blob of raw packet data" and "rich data structures" don't go together; you'll have to parse the raw packet data, the same way implementations of the protocols in the packet do and the same way tcpdump/Wireshark/various other protocol analyzers do, to turn it into structured data, so you can have data on which you can do processing.

So, first, you need to figure out what you're trying to do here. What sort of analysis do you want to do? What packet data do you want to process? Packet time stamps? Source and destination IP addresses? Protocols within a packet? Something in a particular protocol?