file storage, block size and input splits in Hadoop

Question

file storage, block size and input splits in Hadoop

740 views Asked by brain storm At 28 July 2014 at 19:30

Consider this scenario:

I have 4 files each 6 MB each. HDFS block size is 64 MB.

1 block will hold all these files. It has some extra space. If new files are added, it will accommodate here

Now when the input splits are calculated for Map-reduce job by Input format, (split size are usually HDFS block size so that each split can be loaded into memory for processing, there by reducing seek time.)

how many input splits are made here:

is it one because all the 4 files are contained with in a block?
or is it one input split per file?
how is this determined? what if I want all files to be processed as a single input split?

Original Q&A

There are 2 answers

programmerbyheart On 29 July 2014 at 06:16

Each file will get stored in a separate block but file does not occupy a full block of underlying storage, it would use less physical storage.
HDFS is not for smaller files - check this out

**Mike Park** · Accepted Answer · 2014-07-28T22:05:48+00:00

1 block will hold all these files. It has some extra space. If new files are added, it will accommodate here [...] is it one because all the 4 files are contained with in a block?

You'll actually have 4 blocks. It doesn't matter if all files can fit into a single block or not.

EDIT: Blocks belong to a file, not the other way around. HDFS is designed to store large files that are almost certainly going to be larger than your block size. Storing multiple files per block would add unnecessary complexity to the namenode...

Instead of a file being blk0001, it's now blk0001 {file-start -> file-end}.
How do you append to a file?
What happens when you delete a file?
Etc...

or is it one input split per file?

Still 1 split per file.

how is this determined?

This is how.

what if I want all files to be processed as a single input split?

Use a different input format, such as MultipleFileInputFormat.

TechQA.

file storage, block size and input splits in Hadoop

There are 2 answers

Related Questions in HADOOP

Related Questions in MAPREDUCE

Related Questions in HDFS

Related Questions in INPUT-SPLIT

Popular Questions

Popular Tags

Trending Questions