How to do a Map side full outer join in Apache Crunch ( Join type FULL_OUTER_JOIN not supported by MapsideJoinStrategy )

Question

How to do a Map side full outer join in Apache Crunch ( Join type FULL_OUTER_JOIN not supported by MapsideJoinStrategy )

670 views Asked by user3500433 At 20 August 2015 at 06:31

Hi i am trying to do a mapside join in crunch using MapsideJoinStrategy class. It is working fine for inner join but it gives this error for full outer join :" Join type FULL_OUTER_JOIN not supported by MapsideJoinStrategy"

Original Q&A

There are 1 answers

**tworec** · Answer 1 · 2016-03-04T09:22:08+00:00

MapsideJoinStrategy can not perform RIGHT_OUTER_JOIN and so FULL_OUTER_JOIN. It is impossible by design. Whole work happens in mappers (no reduce phase). Since there can be more than one mapper it is not possible to determine which key from right-side will not have matching key on left-side, because single mapper will not see whole left-side data.

For FULL_OUTER_JOIN use DefaultJoinStrategy.

I've extended BloomFilterJoinStrategy to suport all join types. Here is pull request @ GitHub.

TechQA.

How to do a Map side full outer join in Apache Crunch ( Join type FULL_OUTER_JOIN not supported by MapsideJoinStrategy )

There are 1 answers

Related Questions in HADOOP

Related Questions in MAPREDUCE

Related Questions in APACHE-CRUNCH

Popular Questions

Popular Tags

Trending Questions