Technology Archives: Apache Spark Cluster On Local Machines Setup

Before setting up an apache spark cluster on your server environment, you might want to test it by setting up a similar configuration on your local environment and play around with it.

Spark Installation

Download Apache Spark from the link or you can install it from the command line from here

Next verify Java installation with the command below:

$java -version

If it has been installed you should see a similar response below:


java version "1.7.0_71" 

Java(TM) SE Runtime Environment (build 1.7.0_71-b13) 

Java HotSpot(TM) Client VM (build 25.0-b02, mixed mode)

If not then you have to install it

Next verify that Scala has been installed

$scala -version

That should also give a response to the below mesage

Scala code runner version 2.11.6 -- Copyright 2002-2013, LAMP/EPFL

If not then you have to install it.

Continuing with Apache Spark installation

Extract the tar file

$ tar xvf spark-1.3.1-bin-hadoop2.6.tgz

Move the spark folder to the desired directory e.g (/usr/local/spark)


$ su – 

Password:  

# cd /home/Hadoop/Downloads/ 

# mv spark-1.3.1-bin-hadoop2.6 /usr/local/spark 

# exit

Add the following line to the ~/.bashrc file. The essence of this is to add the location where the spark source file are located to the path variable.

export PATH = $PATH:/usr/local/spark/bin

Use the following command for sourcing the ~/.bashrc file.

$ source ~/.bashrc

Verify the successful spark installation on the desired system

$spark-shell

If its successful you should see the following response below:


Spark assembly has been built with Hive, including Datanucleus jars on classpath 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 

15/06/04 15:25:22 INFO SecurityManager: Changing modify acls to: hadoop
15/06/04 15:25:22 INFO SecurityManager: Changing view acls to: hadoop 

ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 
15/06/04 15:25:22 INFO SecurityManager: SecurityManager: authentication disabled;
15/06/04 15:25:22 INFO HttpServer: Starting HTTP Server 

/ __/__  ___ _____/ /__ 
15/06/04 15:25:23 INFO Utils: Successfully started service 'HTTP class server' on port 43292. 
Welcome to 
      ____              __ 
    _\ \/ _ \/ _ `/ __/  '_/ 
   /___/ .__/\_,_/_/ /_/\_\   version 1.4.0 
      /_/  
  

scala> 
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71) 
Type in expressions to have them evaluated. 

Spark context available as sc

All systems that would be on the cluster must first have spark running on them, hence you would have to perform the above operation on all the systems.

Setting-Up The Local Cluster

The way the spark cluster works is one system is the master and the rest are the slaves:

Go to SPARK_HOME/conf and create a file with the name spark-env.sh.

There will be spark-env.sh.template in same folder and this file gives you details on nhowto declare various environment variables.

Enter the master ip address on the masters system

Open the slaves file in the same folder i.e SPARK_HOME/conf and if there is none then create it. Note the slave file does not have any extension.

All these files must be saved on all systems with the same data i.e the master system ip address entered in the spark-env.sh file on all systems and the slave ip address of all other systems entered in the slaves folder of all systems as well. This is very important.

Navigate to the spark folder sbin directory (/usr/local/spark/sbin) and enter the following command

$sudo ./start-all.sh

Enter the password on the prompt
Go to your browser and enter IP_ADDRESS_OF_YOUR_MASTER_SYSTEM:8080 in the URL and press enter

The sudo command is very important or you will get and error message of permission denied.

Note:

The ./start-all.sh command is preferred to starting the master first and then subsequently starting all the slaves afterwards which you can see how that works in the references I added but if you prefer that route then you can do that. I prefer this where one command can be used to start all the systems at once and then stop them afterwards.

you might get an error message of access during after running the start all command but I have actually forgotten how I overcame that access error. If you do get that error. Please contact me and I would help resolve it. Enjoy.

Reference

http://paxcel.net/blog/how-to-setup-apache-spark-standalone-cluster-on-multiple-machine/

https://www.tutorialspoint.com/apache_spark/apache_spark_installation.htm

7 comments:

TejutejuMay 15, 2018 at 6:58 AM
Nice Information Hadoop Admin Online Course
Praylin SFebruary 6, 2019 at 4:21 AM
Awesome blog with great piece of information! It's always nice to read fresh contents. Great work. Keep sharing more.
Spark Training in Chennai
Spark Training Academy Chennai
Oracle Training in Chennai
Oracle Training institute in chennai
VMware Training in Chennai
VMware Course in Chennai
Spark Training in Velachery
Spark Training in Tambaram
UnknownSeptember 16, 2019 at 4:47 AM
Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating big data online training
PrwatechJune 26, 2020 at 7:17 AM
This is most informative and also this post most user friendly and super navigation to all posts.
Apache Spark Training Bangalore
Elliana TaylorAugust 16, 2020 at 12:57 AM
This is really a worthy and wonderful blog to read and further more tips on the software testing have been learnt. thanks for sharing your views among us and its great time spending on this.
Software Testing Services
Software Testing Company
Functional Testing Services
QA Automation Testing Services
Functional Testing Company
Performance Testing Services
Security Testing Services
API Testing Services
Regression Testing Services
eCommerce Testing Services
Mobile App Testing Services
Clove HRSeptember 30, 2025 at 2:17 AM
CloverHR functions as a complete payroll management tool, designed to handle everything from wage calculations and bonuses to tax and PF management. Its automation features help reduce human error while improving payout accuracy. Integrated with other HR modules, it ensures payroll aligns perfectly with real-time attendance and leave data.
Xpert Medicare ClinicsOctober 1, 2025 at 12:48 AM
As a leading Pregnancy Clinic in Noida, Dr. Swati Attam’s facility ensures comprehensive care for every expecting mother. From the first trimester to delivery, her clinic provides continuous monitoring, health assessments, and emotional support. Specializing in high-risk pregnancies and natural deliveries, Dr. Attam and her team work closely with patients to create personalized birthing plans.

Technology Archives

Friday, December 15, 2017

Apache Spark Cluster On Local Machines Setup

7 comments:

How To Upgrade (Flash) Linksys' WRT54G/GL/GS Firmware to Tomato Firmware For IP Address and Bandwidth Monitoring

Report Abuse