Pig Error 1066, Backend error : -1; NegativeArraySizeException; UDF, joda-time, HBase

314 views Asked by At

I'm getting an exception from a Pig script and haven't been able to nail down the cause. I'm fairly new to Pig & have searched for various topics based on the exception I'm getting but haven't been able to find anything meaningful. From the grunt shell & log I've looked for different variations of these - unable to read pigs manifest file java.lang.NegativeArraySizeException: -1 ERROR 1066: Unable to open iterator for alias F. Backend error : -1

I'm using Hadoop version 2.0.0-cdh4.6.0 & Pig version 0.11.0, running from the Grunt shell.

My Pig script reads a file, does some manipulation on the data (including calling a Java UDF), joins to an HBase table, then DUMPs the output. Pretty simple. I can DUMP the intermediate result (alias B) and the data looks fine.

I've tested the Java function from Pig using the same input file and have seen it return values as I'd expect, and I've tested the function locally outside the Pig script. The Java function is provided a number of days from 01-01-1900 & uses joda-time v2.7 to return a Datetime. Initially, the UDF was accepting a tuple as input. I've tried changing the UDF input format to Byte and most recently String and casting to Datetime in Pig upon returning, but am still getting the same error.

When I change my Pig script merely to not call the UDF it works fine. The NegativeArray error sounds like the data is out of whack for the Dump, possibly from some kind of format issue, but I don't see how.

Pig script

A = LOAD 'tst2_SplitGroupMax.txt' using PigStorage(',')  
as (id:bytearray, year:int, doy:int, month:int, dayOfMonth:int,  
 awh_minTemp:double, awh_maxTemp:double,  
 nws_minTemp:double, nws_maxTemp:double,  
 wxs_minTemp:double, wxs_maxTemp:double,  
 tcc_minTemp:double, tcc_maxTemp:double  
 ) ;  

register /import/pool2/home/NA1000APP-TPSDM/ejbles/Test-0.0.1-SNAPSHOT-jar-with-dependencies.jar;  

B = FOREACH A GENERATE id as msmtid, SUBSTRING(id,0,8) as gridid, SUBSTRING(id,9,20) as msmt_days,  
 year, doy, month, dayOfMonth,  
 CONCAT(CONCAT(CONCAT((chararray)year,'-'),CONCAT((chararray)month,'-')),(chararray)dayOfMonth) as msmt_dt,  
 ToDate(monutil.geoloc.GridIDtoDatetime(id)) as func_msmt_dt,  
 awh_minTemp, awh_maxTemp,  
 nws_minTemp, nws_maxTemp,  
 wxs_minTemp, wxs_maxTemp,  
 tcc_minTemp, tcc_maxTemp  
 ;  

E = LOAD 'hbase://wxgrid_detail' using org.apache.pig.backend.hadoop.hbase.HBaseStorage  
 ('loc:country, loc:fips, loc:l1 ,loc:l2, loc:latitude, loc:longitude',  
 '-loadKey=true -caster=HBaseBinaryConverter')  
 as (wxgrid:bytearray, country:chararray, fips:chararray, l1:chararray, l2:chararray,  
   latitude:double, longitude:double);  

F = join B by gridid, E by wxgrid;  

DUMP F;  --- This is where I get the exception  

Here's an excerpt from what's returned in the Grunt shell -
2015-06-15 12:23:24,204 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 2015-06-15 12:23:24,205 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201502081759_916870 has failed! Stop running all dependent jobs 2015-06-15 12:23:24,205 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2015-06-15 12:23:24,221 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: -1 2015-06-15 12:23:24,221 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! 2015-06-15 12:23:24,223 [main] WARN org.apache.pig.tools.pigstats.ScriptState - unable to read pigs manifest file 2015-06-15 12:23:24,224 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt
Features 2.0.0-cdh4.6.0 na1000app-tpsdm 2015-06-15 12:22:39 2015-06-15 12:23:24 HASH_JOIN

Failed!

Failed Jobs: JobId Alias Feature Message Outputs job_201502081759_916870 A,B,E,F HASH_JOIN Message: Job failed!
hdfs://nameservice1/tmp/temp-238648079/tmp-1338617620,

Input(s): Failed to read data from "hbase://wxgrid_detail" Failed to read data from "hdfs://nameservice1/user/na1000app-tpsdm/tst2_SplitGroupMax.txt"

Output(s): Failed to produce result in "hdfs://nameservice1/tmp/temp-238648079/tmp-1338617620"

Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0

Job DAG: job_201502081759_916870

2015-06-15 12:23:24,224 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2015-06-15 12:23:24,234 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias F. Backend error : -1 Details at logfile: /import/pool2/home/NA1000APP-TPSDM/ejbles/pig_1434388844905.log

And here's the log -
Backend error message --------------------- java.lang.NegativeArraySizeException: -1 at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148) at org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:73) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:44) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:233) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:73) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:44) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:356) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:640) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.Child$4.run(Ch

Pig Stack Trace --------------- ERROR 1066: Unable to open iterator for alias F. Backend error : -1

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias F. Backend error : -1 at org.apache.pig.PigServer.openIterator(PigServer.java:828) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) at org.apache.pig.Main.run(Main.java:538) at org.apache.pig.Main.main(Main.java:157) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Caused by: java.lang.NegativeArraySizeException: -1 at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148) at org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:73) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:44) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:233) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:73) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:44) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:356) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:640) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)

0

There are 0 answers