Linked Questions

Popular Questions

Running Spark from a local IDE

Asked by At

I've been spending some time banging my head over trying to run a complex spark application locally in order to test quicker (without having to package and deploy to a cluster).

Some context:

  • This spark application interfaces with Datastax Enterprise version of Cassandra and their distributed file system, so it needs some explicit jars to be provided (not available in Maven)
  • These jars are available on my local machine, and to "cheese" this, I tried placing them in SPARK_HOME/jars so they would be automatically added to the classpath
  • I tried to do something similar with the required configuration settings by putting them in spark-defaults.conf under SPARK_HOME/conf
  • When building this application, we do not build an uber jar, but rather do a spark-submit on the server using --jars

The problem I'm facing, is when I run the Spark Application through my IDE, it seems like it doesn't pick up any of these additional items from the SPARK_HOME director (config or jars). I spent a few hours trying to get the config items to work and ended up setting them as System.property values in my test case before starting the spark session in order for Spark to pick them up, so the configuration settings can be ignored.

However, I do not know how to reproduce this for the vendor specific jar files. Is there an easy way I can emulate the --jars behavior that spark-submit does and some home set up my spark session with this jar value? Note: I am using in my code the following command to start a spark session:

SparkSession.builder().config(conf).getOrCreate()

Additional information, in case it helps:

  • The Spark version I have locally in SPARK_HOME is the same version that my code is compiling with using Maven.
  • I asked another question similar to this related to configs: Loading Spark Config for testing Spark Applications
  • When I print the SPARK_HOME environment variable in my application, I am getting the correct SPARK_HOME value, so I'm not sure why neither the configs or jar files are being picked up from here. Is it possible that when running the application from my IDE, it's not picking up the SPARK_HOME environment variable and using all defaults?

Related Questions