Unable to launch sparkR shell in spark-1.4.0

572 views Asked by At

I downloaded Spark-1.4.0 today and tried to launch the sparkR shell both in Linux and Windows environments - the command sparkR from the bin directory is not working. Anyone successfully launched the sparkR shell, pls. let me know.

Thanks Sanjay

1

There are 1 answers

0
mnm On

I can help you with the setup for Windows. unfortunately, i do not know about Linux. My solution was as follows which I have also posted on my blog

The one limitation of this solution is that it works only on the command line interpreter meaning you can invoke sparkR from command prompt but not from using any front end IDE like RStudio. I’m still trying to figure out how to get sparkR working on RStudio. The trick is to ensure that you set the environment variables correctly. I’m using Windows 7 HP edition 64 bit OS. First step is to download Maven, SBT

Set the variable name as `JAVA_HOME (in case JAVA is not installed on your computer then follow these steps). Next set the variable value as the JDK PATH. In my case it is ‘C:\Program Files\Java\jdk1.7.0_79\’ (please type the path without the single quote) Similarly, create a new system variable and name it as PYTHON_PATH. Set the variable value as the Python Path on your computer. In my case it is ‘C:\Python27\’ (please type the path without the single quote) Create a new system variable and name it as HADOOP_HOME. Set the variable value as C:\winutils. (Note: There is no need to install Hadoop. The spark shell only requires the Hadoop path which in this case holds the value to winutils that will let us compile the spark program on a windows environment. Create a new system variable and name it as SPARK_HOME. Assign the variable value as the path to your Spark binary location. In my case it is in ‘C:\SPARK\BIN’ Create a new system variable and name it as SBT_HOME. Assign the variable value as the path to your Spark binary location. In my case it is in ‘C:\PROGRAM FILES (x86)\SBT\’ Create a new system variable and name it as MAVEN_HOME. Assign the variable value as the path to your Spark binary location. In my case it is in ‘C:\PROGRAM FILES\APACHE MAVEN 3.3.3\’ Once all these variables have been created, next select the “Path” variable under “System variables” and click on the Edit button. A window called “Edit System variable” will pop up. Leave the Variable name “Path” as it is. In the variable value, append the following string as given

%Java_HOME%\bin;%PYTHONPATH%;%SPARK_HOME%;%HADOOP_HOME%;

%MAVEN_HOME%\bin;%M3_HOME%\bin; ' Click on Ok button to close the environment variable window.

Now open up the terminal (the command prompt window) and invoke pysparkR by typing the command pyspark. If you want to invoke scala then the command is spark-shell

Hope this helps.

Cheers