Bigdata Live data streaming using flume

157 views Asked by At

I am trying to analyze twitter data using flume i got the files from twitter using flume in BigInsights but the data I received is of compressed Avro schema which is not readable can anyone tell me a way so that can convert that file to JSON (Readable) in order to do some analysis on it.

Or is there any way so that the data I receive is already in JSON (Readable) format.
Thanks In Advance.

This is the data i received

enter image description here

1

There are 1 answers

4
alpeshpandya On

Avro format is not designed to be human readable and it's desinged to be consumed by programs. But you have a few options to view this data or even better analyze the data.

Create Hive Table: This option will allow you to analyze data using SQL queries, Spark SQL, Spark notebooks, visualization tools like Tableau and Excel too. Your table creation script will look like this:

CREATE TABLE twitter_data
ROW FORMAT
SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.literal'='{...

In schema literal, you can define your own schema too.

Write Program: If you are developer and want to/like to wrangle data using programming, you have many languages to choose from to read, parse, convert and write from Avro file to JSON.