- Next verify Java installation with the command below:
If it has been installed you should see a similar response below:
java version "1.7.0_71"Java(TM) SE Runtime Environment (build 1.7.0_71-b13)Java HotSpot(TM) Client VM (build 25.0-b02, mixed mode)
If not then you have to install it
- Next verify that Scala has been installed
That should also give a response to the below mesage
Scala code runner version 2.11.6 -- Copyright 2002-2013, LAMP/EPFL
If not then you have to install it.
Continuing with Apache Spark installation
- Extract the tar file
$ tar xvf spark-1.3.1-bin-hadoop2.6.tgz
- Move the spark folder to the desired directory e.g (/usr/local/spark)
$ su –Password:# cd /home/Hadoop/Downloads/# mv spark-1.3.1-bin-hadoop2.6 /usr/local/spark# exit
- Add the following line to the ~
export PATH = $PATH:/usr/local/spark/bin
$ source ~/.bashrc
- Verify the successful spark installation on the desired system
If its successful you should see the following response below:
Spark assembly has been built with Hive, including Datanucleus jars on classpathUsing Spark's default log4j profile: org/apache/spark/log4j-defaults.properties15/06/04 15:25:22 INFO SecurityManager: Changing modify acls to: hadoop15/06/04 15:25:22 INFO SecurityManager: Changing view acls to: hadoopui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)15/06/04 15:25:22 INFO SecurityManager: SecurityManager: authentication disabled; 15/06/04 15:25:22 INFO HttpServer: Starting HTTP Server/ __/__ ___ _____/ /__15/06/04 15:25:23 INFO Utils: Successfully started service 'HTTP class server' on port 43292. Welcome to ____ __ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.4.0 /_/scala>Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71) Type in expressions to have them evaluated.Spark context available as sc
All systems that would be on the cluster must first have spark running on them, hence you would have to perform the above operation on all the systems.
Setting-Up The Local Cluster
The way the spark cluster works is one system is the master and the rest are the slaves:
- Go to SPARK_HOME/conf and create a file with the name spark-env.sh.
There will be spark-env.sh.template in same folder and this file gives you details on nhowto declare various environment variables.
- Enter the master ip address on the masters system
- Open the slaves file in the same folder i.e SPARK_HOME/conf and if there is none then create it. Note the slave file does not have any extension.
All these files must be saved on all systems with the same data i.e the master system ip address entered in the spark-env.sh file on all systems and the slave ip address of all other systems entered in the slaves folder of all systems as well. This is very important.
- Navigate to the spark folder sbin directory (/usr/local/spark/sbin) and enter the following command
- Enter the password on the prompt
- Go to your browser and enter IP_ADDRESS_OF_YOUR_MASTER_SYSTEM:8080 in the URL and press enter
The sudo command is very important or you will get and error message of permission denied.
The ./start-all.sh command is preferred to starting the master first and then subsequently starting all the slaves afterwards which you can see how that works in the references I added but if you prefer that route then you can do that. I prefer this where one command can be used to start all the systems at once and then stop them afterwards.
you might get an error message of access during after running the start all command but I have actually forgotten how I overcame that access error. If you do get that error. Please contact me and I would help resolve it. Enjoy.