In order to test and learn Spark functions, developers require Spark latest version. As the API's and methods earlier to version 2.0 are obsolete and no longer work in the newer version. This throws a bigger challenge and developers are forced to install Spark manually which wastes a considerable amount of development time.
How do I use a later version of Spark on the Quickstart VM?
Every one should not waste setup time which I have wasted, so here is the solution.
SPARK 2.2 Installation Setup on Cloudera VM
Step 1: Download a quickstart_vm from the link:
Prefer a vmware platform as it is easy to use, anyways all the options are viable.
Size is around 5.4gb of the entire tar file. We need to provide the business email id as it won’t accept personal email ids.
Step 2: The virtual environment requires around 8gb of RAM, please allocate sufficient memory to avoid performance glitches.
Step 3: Please open the terminal and switch to root user as:
Step 4: Cloudera provides java –version 1.7.0_67 which is old and does not match with our needs. To avoid java related exceptions, please install java with the following commands:
Downloading Java:
Switch to /usr/java/ directory with “cd /usr/java/” command.
cp the java download tar file to the /usr/java/ directory.
Untar the directory with “tar –zxvf jdk-8u31-linux-x64.tar.gz”
Open the profile file with the command “vi ~/.bash_profile”
export JAVA_HOME to the new java directory.
Save and Exit.
In order to reflect the above change, following command needs to be executed on the shell:
The Cloudera VM provides spark 1.6 version by default. However, 1.6 API’s are old and do not match with production environments. In that case, we need to download and manually install Spark 2.2.
Switch to /opt/ directory with the command:
Download spark with the command:
Untar the spark tar with the following command:
We need to define some environment variables as default settings:
Please open a file with the following command:
Paste the following configurations in the file:
Save and exit
We need to start spark with the following command:
Export spark_home :
Change the permissions of the directory:
Try “spark-shell”, it should work.