How to integrate Apache Spark with Spring MVC web application for interactive user sessions

5.3k views Asked by At

I am trying to build a Movie Recommender System Using Apache Spark MLlib. I have written a code for recommender in java and its working fine when run using spark-submit command.

My run command looks like this

bin/spark-submit --jars /opt/poc/spark-1.3.1-bin-hadoop2.6/mllib/spark-mllib_2.10-1.0.0.jar --class "com.recommender.MovieLensALSExtended" --master local[4] /home/sarvesh/Desktop/spark-test/recommender.jar /home/sarvesh/Desktop/spark-test/ml-latest-small/ratings.csv /home/sarvesh/Desktop/spark-test/ml-latest-small/movies.csv

Now I want to use my recommender in real world scenario, as a web application in which I can query recommender to give some result.

I want to build a Spring MVC web application which can interact with Apache Spark Context and give me results when asked.

My question is that how I can build an application which interacts with Apache Spark which is running on a cluster. So that when a request comes to controller it should take user query and fetch the same result as the spark-submit command outputs on console.

As far as I have searched, I found that we can use Spark SQL, integrate with JDBC. But I did not find any good example.

Thanks in advance.

5

There are 5 answers

0
Biju CD On

For isolating the user sessions and showing the results in an isolated manner, you may need to use queues with a binded user identity. Incase the the results takes time, with this identity you can show the respective results to the user.

0
ravi ranjan On

just pass the spark context and session as a bean in Spring

@Bean
public SparkConf sparkConf() {
    SparkConf sparkConf = new SparkConf()
            .setAppName(appName)
            .setSparkHome(sparkHome)
            .setMaster(masterUri);

    return sparkConf;
}

@Bean
public JavaSparkContext javaSparkContext() {
    return new JavaSparkContext(sparkConf());
}

@Bean
public SparkSession sparkSession() {
    return SparkSession
            .builder()
            .sparkContext(javaSparkContext().sc())
            .appName("Java Spark Ravi")
            .getOrCreate();
}

Similarly for xml based configuration

Fully working code with spring and spark is present here

https://github.com/ravi-code-ranjan/spark-spring-seed-project

0
Amanpreet Khurana On

I am a bit late , but this can help other users. If the requirement is to fetch data from Spark remotely, then you can consider using HiveThriftServer2. This server exposes the Spark SQL (cached and temporary tables) as JDBC/ODBC Database.

So, you can connect to Spark by using a JDBC/ODBC driver, and access data from the SQL tables.

To do the above:

  1. Include this code in your Spark application:

    A. Create Spark conf with following properties:

    config.set("hive.server2.thrift.port","10015");
    config.set("spark.sql.hive.thriftServer.singleSession", "true");
    

    B.Then , pass the SQL context to the thrift server , and start it as below:

     HiveThriftServer2.startWithContext(session.sqlContext());
    

This will start the Thrift server with the SQL context of your application. So it will be able to return data from the tables created in this context

  1. On the client side, you can use below code to connect to Spark SQL:

    Connection con = DriverManager.getConnection("jdbc:hive2://localhost:10015/default", "", "");
    
    Statement stmt = con.createStatement();            
    ResultSet rs = stmt.executeQuery("select count(1) from ABC");
    
0
Huy Banh On

To interact with data model (call its invoke method?), you could build a rest service inside the driver. This service listens for requests, and invokes model's predict method with input from the request, and returns result.

http4s (https://github.com/http4s/http4s) could be used for this purpose.

Spark SQL is not relevant, as it is to handle data analytics (which you have done already), with sql capabilities.

Hope this helps.

0
eugenio calabrese On

For this kind of situation was developed a REST interface for lunching and sharing the context of spark jobs

Give a look at the documentation here :

https://github.com/spark-jobserver/spark-jobserver