How are keys, values, and records delimited in Hadoop streaming, typedbytes, and/or rawbytes

Question

How are keys, values, and records delimited in Hadoop streaming, typedbytes, and/or rawbytes

393 views Asked by ChaseMedallion At 20 August 2012 at 00:59

I understand that that text records in Hadoop streaming are delimited by the newline character and that there is a configurable delimiter between keys and values (defaults to tab).

1) The structure of the rawbytes format suggests that no record or key/value delimiters are necessary, but can someone confirm that this is the case?

2) In the typedbytes format, how are keys and values delimited, and how are records delimited?

3) Also, how are keys sorted in the typedbytes and rawbytes format?

Original Q&A

There are 1 answers

**piccolbo** · Answer 1 · 2014-01-09T19:45:42+00:00

Correct
Length information in the header makes delimiters unnecessary, and in fact they are not used in the spec, with one exception, the 255 delimited list, typecode 9
No sort order is specified. In my experience the default comparator in mapreduce sorts them as raw bytes, numerically for each byte and lexicographically for arrays. It is pluggable, so you can change that with your own Java class.

See https://hadoop.apache.org/docs/current2/api/org/apache/hadoop/typedbytes/package-summary.html

Antonio

TechQA.

How are keys, values, and records delimited in Hadoop streaming, typedbytes, and/or rawbytes

There are 1 answers

Related Questions in JAVA

Related Questions in HADOOP

Related Questions in MAPREDUCE

Related Questions in HADOOP-STREAMING

Related Questions in HADOOPY

Popular Questions

Popular Tags

Trending Questions