Writing to multiple HCatalog schemas in single reducer?

Question

Writing to multiple HCatalog schemas in single reducer?

154 views Asked by Andrew Certain At 13 December 2013 at 22:40

I have a set of Hadoop flows that were written before we started using Hive. When we added Hive, we configured the data files as external tables. Now we're thinking about rewriting the flows to output their results using HCatalog. Our main motivation to make the change is to take advantage of the dynamic partitioning.

One of the hurdles I'm running into is that some of our reducers generate multiple data sets. Today this is done with side-effect files, so we write out each record type to its own file in a single reduce step, and I'm wondering what my options are to do this with HCatalog.

One option obviously is to have each job generate just a single record type, reprocessing the data once for each type. I'd like to avoid this.

Another option for some jobs is to change our schema so that all records are stored in a single schema. Obviously this option works well if the data was just broken apart for poor-man's partitioning, since HCatalog will take care of partitioning the data based on the fields. For other jobs, however, the types of records are not consistent.

It seems that I might be able to use the Reader/Writer interfaces to pass a set of writer contexts around, one per schema, but I haven't really thought it through (and I've only been looking at HCatalog for a day, so I may be misunderstanding the Reader/Writer interface).

Does anybody have any experience writing to multiple schemas in a single reduce step? Any pointers would be much appreciated.

Thanks.

Andrew

Original Q&A

There are 1 answers

**Andrew Certain** · Answer 1 · 2013-12-20T07:33:56+00:00

Andrew Certain On 20 December 2013 at 07:33

As best I can tell, the proper way to do this is to use the MultiOutputFormat class. The biggest help for me was the TestHCatMultiOutputFormat test in Hive.

Andrew

TechQA.

Writing to multiple HCatalog schemas in single reducer?

There are 1 answers

Related Questions in HADOOP

Related Questions in HIVE

Related Questions in HCATALOG

Popular Questions

Popular Tags

Trending Questions