How can I explain Hadoop not to split my file in some special MapReduce task?

95 views Asked by At
  1. Given I have a file to process with Hadoop and I know that size of file is smaller than block size of HDFS. Does this guarantees that the file will not be splitted and I dont need to write an InputSplit for it because the default one will not split it?

  2. Given the file saved with SequenceFileOutputFormat (or some other output format) is bigger than block size, but consists of only one key-value pair. Does this implies that file block's will be stored on the same node (except for replicated copies) and MapReduce task will not waste to much time to fetch them? Does this means I dont need to write my own inputSplit because the key will not be splitted (key size is smaller than block size and there is only one key)?

0

There are 0 answers