I'm trying to get Scalding working on Zeppelin while using YARN. I followed the steps in the docs here to build the interpreter and set up the classpath override. When I run in local mode, code executes properly. However when I run on my cluster via YARN my jobs fail with:
Error: java.lang.ClassNotFoundException: cascading.CascadingException
or
Error: java.lang.ClassNotFoundException: cascading.tuple.TupleException
What is even stranger to me is that I can go into Zeppelin and execute:
import cascading.tuple.TupleException
import cascading.CascadingException
And both appear to have no problem finding those classes. It is only when I try to actually use scalding (on YARN), like loading data into a typed pipe and dumping that I get the ClassNotFoundException. Any ideas on how to debug or what to fix?
It looks like the cascading jars are not distributed to the YARN cluster. Please add "zeppelin/interpreter/scalding/*" to the args.string property of the scalding interpreter.
Here's the args.string we use:
-libjars /home/zeppelin-user/zeppelin/interpreter/scalding/,/home/zeppelin-user/deploy-bundle-201608111417/libs/ -Dscalding.reducer.estimator.classes=com.twitter.scalding.reducer_estimation.InputSizeReducerEstimator -Delephantbird.use.combine.input.format=true -Delephantbird.combine.split.size=134217728 --hdfs --repl
tmpjars contains jars that are distributed to the YARN cluster. You can see its contents with the command below: