Linked Questions

Popular Questions

outputschema using python udf in Apache pig

Asked by At

My python udf returns a list of tuples like this:

[(0.01, 12), (0.02, 6), (0.03, 12), (0.04, 19), (0.05, 29), (0.06, 42)]

The above is printed into and copied from the mapper's stdout.

The two values in the tuples are casted to float and int respectively. I also printed out the type and it's indeed correctly casted.

(<type 'float'>, <type 'int'>)

Here is the decorator @outputSchema("stats:bag{improvement:tuple(percent:float,entityCount:int)}")

Here is the error message:

Error: java.io.IOException: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.RuntimeException: Datum (0.01,12) is not in union ["null",{"type":"record","name":"TUPLE_1","fields":[{"name":"percent","type":["null","float"],"doc":"autogenerated from Pig Field Schema","default":null},{"name":"entityCount","type":["null","int"],"doc":"autogenerated from Pig Field Schema","default":null}]}] at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:479) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:442) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:422) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:269) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.RuntimeException: Datum (0.01,12) is not in union ["null",{"type":"record","name":"TUPLE_1","fields":[{"name":"percent","type":["null","float"],"doc":"autogenerated from Pig Field Schema","default":null},{"name":"entityCount","type":["null","int"],"doc":"autogenerated from Pig Field Schema","default":null}]}] at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:263) at org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49) at org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:646) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:477) ... 11 more Caused by: java.lang.RuntimeException: Datum (0.01,12) is not in union ["null",{"type":"record","name":"TUPLE_1","fields":[{"name":"percent","type":["null","float"],"doc":"autogenerated from Pig Field Schema","default":null},{"name":"entityCount","type":["null","int"],"doc":"autogenerated from Pig Field Schema","default":null}]}] at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.resolveUnion(PigAvroDatumWriter.java:132) at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeUnion(PigAvroDatumWriter.java:111) at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:82) at org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:131) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:68) at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99) at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeUnion(PigAvroDatumWriter.java:113) at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:82) at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeRecord(PigAvroDatumWriter.java:378) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66) at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58) at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:257) ...

Anyone know what I did wrong in the schema?

Related Questions