Mahout minhash org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text

647 views Asked by At

I am using :

hadoop-1.2.1 and mahout-distribution-0.8

When I try to run HASHMIN method with following command:

$MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.minhash.MinHashDriver -i tce-data/cv.vec -o tce-data/out/cv/minHashDriver/ -ow

I get this error:

tce@osy-Inspiron-N5110:~$ $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.minhash.MinHashDriver  -i  tce-data/cv.vec  -o tce-data/out/cv/minHashDriver/ -ow
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /home/tce/app/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/tce/app/mahout-distribution-0.8/mahout-examples-0.8-job.jar
Warning: $HADOOP_HOME is deprecated.

13/09/10 18:17:46 WARN driver.MahoutDriver: No org.apache.mahout.clustering.minhash.MinHashDriver.props found on classpath, will use command-line arguments only
13/09/10 18:17:46 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --hashType=[MURMUR], --input=[tce-data/cv.vec], --keyGroups=[2], --minClusterSize=[10], --minVectorSize=[5], --numHashFunctions=[10], --numReducers=[2], --output=[tce-data/out/cv/minHashDriver/], --overwrite=null, --startPhase=[0], --tempDir=[temp], --vectorDimensionToHash=[value]}
13/09/10 18:17:48 INFO input.FileInputFormat: Total input paths to process : 1
13/09/10 18:17:50 INFO mapred.JobClient: Running job: job_201309101645_0031
13/09/10 18:17:51 INFO mapred.JobClient:  map 0% reduce 0%
13/09/10 18:18:27 INFO mapred.JobClient: Task Id : attempt_201309101645_0031_m_000000_0, Status : FAILED
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
    at org.apache.mahout.clustering.minhash.MinHashMapper.map(MinHashMapper.java:30)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

I appreciate any idea

1

There are 1 answers

0
Sandip On

Then cross check few things, Your job.setOutputKeyClass, job.setOutputValueClass, job.setMapOutputKeyClass and job.setMapOutputValueClass should match with reducer key, reducer value, mapper key and mapper value class respectively.

Your stacktrace says there is mismatch in Mapper. Your MinHashMapper should extend Mapper<[A, B, C, D >] where C and D be same as job.setMapOutputKeyClass(C) and job.setMapOutputValueClass(D)