I've installed the Cloudera Hadoop-LZO package and added the following settings into my client environment safety valve:
HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/*
JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native
However, I get the strangest native-lzo library not available error:
13/08/05 23:59:06 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
13/08/05 23:59:06 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 6298911ef75545c61859c08add6a74a83e0183ad]
13/08/05 23:59:07 INFO mapred.JobClient: Running job: job_201308052350_0003
13/08/05 23:59:08 INFO mapred.JobClient: map 0% reduce 0%
13/08/05 23:59:18 INFO mapred.JobClient: Task Id : attempt_201308052350_0003_m_000000_0, Status : FAILED
java.lang.RuntimeException: native-lzo library not available
at com.hadoop.compression.lzo.LzopCodec.getDecompressorType(LzopCodec.java:96)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:131)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:86)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:478)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:671)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Why would it say that the native-lzo library was loaded successfully, and then complain that the native-lzo library was not available? Are these exceptions coming out of the DataNodes?
The issue was that we did not have lzop installed on the datanodes themselves! After the following instruction, all was well:
Hope that helps!