Cascading / Hadoop: GroupBy / Reducer Based on Job Size

54 views Asked by At

Use case: After every record goes through an Each pipe, they'll be processed in batches of 200. It doesn't matter how they're grouped or ordered; aggregating records as groups of size 200 is sufficient.

Was thinking about the Each pipe emitting result and a Key, which acts as groupFields in GroupBy. Then the GroupBy will emit groups of size 200 that I can process using Every. However, I won't know how many records there are up front, thus I'm stuck in designing a function to generate the Key.

0

There are 0 answers