Issues with Scala ScriptEngine inside spark submit application

331 views Asked by At

I am working on a system where I let users write DSLS and I load it as instances of my Type during runtime and these can be applied on top of RDDs. The entire application runs as a spark-submit application and I use ScriptEngine engine to compile DSLs written in Scala itself. Every tests works fine in SBT and IntelliJ. But while doing a spark-submit my own types available in my fat-jar is not available to import in Script. I initialize script engine as follows.

val engine: ScriptEngine = new ScriptEngineManager().getEngineByName("scala")
private val settings: Settings = engine.asInstanceOf[scala.tools.nsc.interpreter.IMain].settings
settings.usejavacp.value = true

settings.embeddedDefaults[DummyClass]
private val loader: ClassLoader = Thread.currentThread().getContextClassLoader
settings.embeddedDefaults(loader)

It seems like this is a problem with classloader during spark-submit. But I am not able to figure out the reason why my own types in my jar which also has the main program for spark-submit is unavailable in my script which is created in same JVM. scala scala-compiler,scala-reflect and scala-library versions are 2.11.8. Some help will be greatly appreciated.

2

There are 2 answers

1
paulsonvincent On

I have found a working solution. By going through code and lot of debugging, I finally found out that ScriptEngine creates a Classloader for itself by consuming Classpath string of Classloader used to create it. In case of spark-submit, spark creates a special classloader which can read from both local and hdfs files. But classpath string obtained from this classloader will not have our application jars which is present in HDFS.

By manually appending my application jar to the ScriptEngine classpath before initialising it solved my problems. For this to work I had to locally download my application jar in HDFS to local before appending it.

0
Daniel Darabos On

If you instantiate the Scala interpreter directly instead of via ScriptEngineManager, you can pass in settings and override the classpath:

val cl = java.lang.Thread.currentThread.getContextClassLoader
val jar = cl.asInstanceOf[java.net.URLClassLoader].getURLs.toList.head.toString
val settings = new scala.tools.nsc.Settings()
settings.classpath.value = jar
val engine = scala.tools.nsc.interpreter.Scripted(settings = settings)