How To Install Apache Spark On Ubuntu

1 mainThere’s so much data being generated in today’s world that we need platforms and frameworks that it’s mind boggling. This field of study is called Big Data Analysis. With so much data lying around, often ranging in petabytes and exabytes, we need super powerful systems to process it. Not only that, we need to do it high efficiency. If you try to do it using your regular ways, you will never be able to do anything in time, let alone doing it in real-time. This is where Apache Spark comes into picture. It is an open source big data processing framework that can process massive amounts of data at high speed using cluster computing. Let’s see how we can install it on Ubuntu.  


The first step is to update the packages:

$ sudo apt-get update

We need to install JRE and JDK. The following command will install the latest versions of OpenJRE and OpenJDK:

$ sudo apt-get install -y default-jre default-jdk

You need to install git (you’ll need it during the build process):

$ sudo apt-get install git

We are ready to proceed with the installation.

Download Spark

Go to this site and choose the following options:

  • Choose a Spark release: pick the latest
  • Choose a package type: Source code [can build several Hadoop versions]
  • Choose a download type: Select Apache mirror

You will see “Download Spark” below it and a link next to it, but note that this is NOT the final download link. Click on this link and it will take you to a webpage. There will be a download link at the top. This is the link we need to use to download:

$ wget

Install Scala

Spark is written in Scala, so we need to install Scala to built Spark. Download the latest stable version of Scala from here. Don’t download any versions with “-M1”, “-M2”, etc. Run the following commands to download and place it in the right directory:

$ wget
$ sudo mkdir /usr/local/src/scala
$ sudo tar xvf scala-2.10.6.tgz -C /usr/local/src/scala/

Go to the end of your “~/.bashrc” file and add the following lines:

export SCALA_HOME=/usr/local/src/scala/scala-2.10.6

Restart “.bashrc” file:

$ . ~/.bashrc

Check if Scala is installed successfully by running the following command:

$ scala -version

You should see the following on your terminal:

Scala code runner version 2.10.6 -- Copyright 2002-2013, LAMP/EPFL

Build Spark

We are ready to build Spark now. Note that it will take a while to build, so you need to be patient:

$ cd /path/to/spark-1.5.1
$ sbt/sbt assembly

Once it’s done, run the following command to check if everything is good:

$ ./bin/run-example SparkPi 10

A lot of stuff will be printed on terminal. Somewhere in there, you should see “Pi is roughly 3.141108”. It will print all these log messages every time we run something. To avoid that, go into the “spark-1.5.1” directory and run the following command on the terminal:

$ cp conf/ conf/

Open the newly created “conf/” file and replace the following line:

log4j.rootCategory=INFO, console


log4j.rootCategory=ERROR, console

Save the file and exit. Now run the following on your terminal:

$ ./bin/run-example SparkPi 10

You will see only “Pi is roughly 3.141108” printed on the terminal. We are now ready to roll! You can start the Python shell by running the following command:

$ ./bin/pyspark

You can run all the Python commands from this shell to make Spark do all the magic!


3 thoughts on “How To Install Apache Spark On Ubuntu

  1. export SCALA_HOME=/usr/local/src/scala/scala-2.10.4

    should read

    export SCALA_HOME=/usr/local/src/scala/scala-2.10.6

Leave a Reply to Marc Cancel reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s