org.apache.crunch.CrunchRuntimeException: java.io.NotSerializableException

1.2k views Asked by At

I have a PTable<String, Pair<Entity1, Entity2>> which is generated at an intermediate stage of program on which i am running transformation job. Sample PTable entry:

["0067b4c054d14fe2-ACC8D37", 
 [{
    "unique_id": "0067b4c054d14fe2-ACC8D37",
    "user_id": "ACC8D37",
    "campaign_id": "3URL6GC2",
    "seller_id": "0067b4c"
}, {
    "unique_id": "0067b4c054d14fe2-ACC8D37",
    "user_id": "ACC8D37",
    "seller_id": "0067b4c"
}]]

I neeed to get a PCollection<String> where Entity2 is null

Transformer DoFn

private PCollection<String> transformPairTable(PCollection<Pair<Entity1, Entity2>> pairPCollection) {
        return pairPCollection.parallelDo(new DoFn<Pair<Entity1, Entity2>, String>() {
            @Override
            public void process(Pair<Entity1, Entity2> entityPair, Emitter<String> emitter) {
                Entity1 entity1 = entityPair.first();
                Entity2 entity2 = entityPair.second();
                if (entity2 == null) {
                    String campaignId = entity1.getCampaignId();
                    emitter.emit(campaignId);
                }
            }
        }, Avros.strings());
    }

Entity1 and Entity2 both are Generated classes from Avro schema.

But when i run the job it throws runtime exception

Exception in thread "main" org.apache.crunch.CrunchRuntimeException: java.io.NotSerializableException: com.xxx.bb.data.vv.jobs.GenerateCount
    at org.apache.crunch.impl.mr.MRPipeline.plan(MRPipeline.java:140)
    at org.apache.crunch.impl.mr.MRPipeline.runAsync(MRPipeline.java:159)
    at org.apache.crunch.impl.mr.MRPipeline.run(MRPipeline.java:147)
    at org.apache.crunch.materialize.MaterializableIterable.iterator(MaterializableIterable.java:94)
    at org.apache.crunch.materialize.pobject.FirstElementPObject.process(FirstElementPObject.java:44)
    at org.apache.crunch.materialize.pobject.PObjectImpl.getValue(PObjectImpl.java:71)
    at com.xxxx.bb.xxxx.xxx.jobs.GenerateNewCount.getNewCustomersPCollection(GenerateNewCount.java:239)
    at com.xxx.x.data.x.jobs.GenerateNewCount.runJob(GenerateNewCount.java:78)
    at com.xxx.x.data.xx.jobs.BaseJob.run(BaseJob.java:87)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)

I tried Writables.strings() but it gave same exception.

The PTable which is used in the job has entries with Entity2 equals to null.

I tried transforming the PTable in many ways but its not working. i am not able to figure out the main reason behind it.

When i use

Entity1 p1= pairPCollection.first().getValue().first();

It throws exception mentioned below:

ERROR materialize.MaterializableIterable: Could not materialize: Avro(/tmp/crunch-1893789994/p7)
java.io.IOException: No files found to materialize at: /tmp/crunch-1893789994/p7
    at org.apache.crunch.io.CompositePathIterable.create(CompositePathIterable.java:49)
    at org.apache.crunch.io.impl.FileSourceImpl.read(FileSourceImpl.java:134)
    at org.apache.crunch.io.avro.AvroFileSource.read(AvroFileSource.java:89)
0

There are 0 answers