ClassNotFoundException with Scalding on Zeppelin managed on YARN

85 views Asked by At

I'm trying to get Scalding working on Zeppelin while using YARN. I followed the steps in the docs here to build the interpreter and set up the classpath override. When I run in local mode, code executes properly. However when I run on my cluster via YARN my jobs fail with:

Error: java.lang.ClassNotFoundException: cascading.CascadingException

or

Error: java.lang.ClassNotFoundException: cascading.tuple.TupleException

What is even stranger to me is that I can go into Zeppelin and execute:

import cascading.tuple.TupleException
import cascading.CascadingException

And both appear to have no problem finding those classes. It is only when I try to actually use scalding (on YARN), like loading data into a typed pipe and dumping that I get the ClassNotFoundException. Any ideas on how to debug or what to fix?

1

There are 1 answers

0
Prasad Wagle On BEST ANSWER

It looks like the cascading jars are not distributed to the YARN cluster. Please add "zeppelin/interpreter/scalding/*" to the args.string property of the scalding interpreter.

Here's the args.string we use:

-libjars /home/zeppelin-user/zeppelin/interpreter/scalding/,/home/zeppelin-user/deploy-bundle-201608111417/libs/ -Dscalding.reducer.estimator.classes=com.twitter.scalding.reducer_estimation.InputSizeReducerEstimator -Delephantbird.use.combine.input.format=true -Delephantbird.combine.split.size=134217728 --hdfs --repl

tmpjars contains jars that are distributed to the YARN cluster. You can see its contents with the command below:

%scalding 
mode.asInstanceOf[Hdfs].conf.get("tmpjars").split(",").foreach(println)