Run Scala Program with Spark on Hadoop

8.1k views Asked by At

i have create a scala program that search a word in a text file. I create the file scala with eclipse and after i compile and create a jar with sbt and sbt assembly.After that i run the .jar with Spark in local and it run correctly. Now i want try to run this program using Spark on hadoop, i have 1 master and 2 work machine. I have to change the code ? and what command i do with the shell of the master? i have create a bucket and i have put the text file in hadoop

this is my code:

    import scala.io.Source
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object wordcount {
    def main(args: Array[String]) {
      // set spark context
      val conf = new SparkConf().setAppName("wordcount").setMaster("local[*]")
      val sc = new SparkContext(conf)

      val distFile = sc.textFile("bible.txt")

      print("Enter word to look for in the HOLY BILE: ")
      val word = Console.readLine
      var count = 0;
      var finalCount=0;
      println("You entered " + word)
      val input = sc.textFile("bible.txt")
      val splitedLines = input.flatMap(line => line.split(" "))
                    .filter(x => x.equals(word))

System.out.println("The word " + word + " appear " + splitedLines.count())
    }
}

Thanks all

1

There are 1 answers

5
Sathish On

Just change the following line,

val conf = new SparkConf().setAppName("wordcount").setMaster("local[*]")

to

val conf = new SparkConf().setAppName("wordcount")

This will allow you not to modify the code whenever you want to switch from local mode to cluster mode. The master option can be passed via the spark-submit command as follows,

spark-submit --class wordcount  --master <master-url> --jars wordcount.jar

and if you want to run your program locally, use the following command,

spark-submit --class wordcount  --master local[*] --jars wordcount.jar

here is the list of master option that you can set while running the application.