i have create a scala program that search a word in a text file. I create the file scala with eclipse and after i compile and create a jar with sbt and sbt assembly.After that i run the .jar with Spark in local and it run correctly. Now i want try to run this program using Spark on hadoop, i have 1 master and 2 work machine. I have to change the code ? and what command i do with the shell of the master? i have create a bucket and i have put the text file in hadoop
this is my code:
import scala.io.Source
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object wordcount {
def main(args: Array[String]) {
// set spark context
val conf = new SparkConf().setAppName("wordcount").setMaster("local[*]")
val sc = new SparkContext(conf)
val distFile = sc.textFile("bible.txt")
print("Enter word to look for in the HOLY BILE: ")
val word = Console.readLine
var count = 0;
var finalCount=0;
println("You entered " + word)
val input = sc.textFile("bible.txt")
val splitedLines = input.flatMap(line => line.split(" "))
.filter(x => x.equals(word))
System.out.println("The word " + word + " appear " + splitedLines.count())
}
}
Thanks all
Just change the following line,
to
This will allow you not to modify the code whenever you want to switch from local mode to cluster mode. The master option can be passed via the spark-submit command as follows,
and if you want to run your program locally, use the following command,
here is the list of master option that you can set while running the application.