Reducer Based on Job Size

55 views Asked by goldfrapp04 At 12 June 2015 at 22:47

Use case: After every record goes through an Each pipe, they'll be processed in batches of 200. It doesn't matter how they're grouped or ordered; aggregating records as groups of size 200 is sufficient.

Was thinking about the Each pipe emitting result and a Key, which acts as groupFields in GroupBy. Then the GroupBy will emit groups of size 200 that I can process using Every. However, I won't know how many records there are up front, thus I'm stuck in designing a function to generate the Key.

Original Q&A

TechQA.

Cascading / Hadoop: GroupBy / Reducer Based on Job Size

There are 0 answers

Related Questions in HADOOP

Related Questions in CASCADING

Popular Questions

Popular Tags

Trending Questions