How to extarct contents of bz2 files - Hadoop

611 views Asked by keerthu praveen At 17 June 2015 at 10:06

I have a tar archive (about 40 GB) which has many subfolders within which my data resides. The structure is : Folders -> Sub Folders -> json.bz2 files. TAR file:

Total size: ~ 40GB
Number of inner .bz2 files (arranged in folders): 50,000
Size of one .bz2 file: ~700kb
Size of one extracted JSON file: ~6 MB.

I have to load the json files into the HDFS cluster. I am trying to manually extract it in my local directory but I am running out of space. I am planning to load the archive directly into HDFS and then uncompress it . But I dont know whether it is a good way to solve the problem. As I am new to Hadoop any pointers would be helpful.

Original Q&A

TechQA.

How to extarct contents of bz2 files - Hadoop

There are 0 answers

Related Questions in JSON

Related Questions in HADOOP

Related Questions in MAPREDUCE

Related Questions in BZIP2

Popular Questions

Popular Tags

Trending Questions