How To Install Apache Spark On Ubuntu

There’s so much data being generated in today’s world that we need platforms and frameworks that it’s mind boggling. This field of study is called Big Data Analysis. With so much data lying around, often ranging in petabytes and exabytes, we need super powerful systems to process it. Not only that, we need to do it high efficiency. If you try to do it using your regular ways, you will never be able to do anything in time, let alone doing it in real-time. This is where Apache Spark comes into picture. It is an open source big data processing framework that can process massive amounts of data at high speed using cluster computing. Let’s see how we can install it on Ubuntu.

Prerequisites

The first step is to update the packages:

$ sudo apt-get update

We need to install JRE and JDK. The following command will install the latest versions of OpenJRE and OpenJDK:

$ sudo apt-get install -y default-jre default-jdk

You need to install git (you’ll need it during the build process):

$ sudo apt-get install git

We are ready to proceed with the installation.

Download Spark

Go to this site and choose the following options:

Choose a Spark release: pick the latest
Choose a package type: Source code [can build several Hadoop versions]
Choose a download type: Select Apache mirror

You will see “Download Spark” below it and a link next to it, but note that this is NOT the final download link. Click on this link and it will take you to a webpage. There will be a download link at the top. This is the link we need to use to download:

$ wget http://www.us.apache.org/dist/spark/spark-1.5.1/spark-1.5.1.tgz

Install Scala

Spark is written in Scala, so we need to install Scala to built Spark. Download the latest stable version of Scala from here. Don’t download any versions with “-M1”, “-M2”, etc. Run the following commands to download and place it in the right directory:

$ wget http://www.scala-lang.org/files/archive/scala-2.10.6.tgz
$ sudo mkdir /usr/local/src/scala
$ sudo tar xvf scala-2.10.6.tgz -C /usr/local/src/scala/

Go to the end of your “~/.bashrc” file and add the following lines:

export SCALA_HOME=/usr/local/src/scala/scala-2.10.6
export PATH=$SCALA_HOME/bin:$PATH

Restart “.bashrc” file:

$ . ~/.bashrc

Check if Scala is installed successfully by running the following command:

$ scala -version

You should see the following on your terminal:

Scala code runner version 2.10.6 -- Copyright 2002-2013, LAMP/EPFL

Build Spark

We are ready to build Spark now. Note that it will take a while to build, so you need to be patient:

$ cd /path/to/spark-1.5.1
$ sbt/sbt assembly

Once it’s done, run the following command to check if everything is good:

$ ./bin/run-example SparkPi 10

A lot of stuff will be printed on terminal. Somewhere in there, you should see “Pi is roughly 3.141108”. It will print all these log messages every time we run something. To avoid that, go into the “spark-1.5.1” directory and run the following command on the terminal:

$ cp conf/log4j.properties.template conf/log4j.properties

Open the newly created “conf/log4j.properties” file and replace the following line:

log4j.rootCategory=INFO, console

log4j.rootCategory=ERROR, console

Save the file and exit. Now run the following on your terminal:

$ ./bin/run-example SparkPi 10

You will see only “Pi is roughly 3.141108” printed on the terminal. We are now ready to roll! You can start the Python shell by running the following command:

$ ./bin/pyspark

You can run all the Python commands from this shell to make Spark do all the magic!

———————————————————————————————————

3 thoughts on “How To Install Apache Spark On Ubuntu”

Pingback: Getting Started With Apache Spark In Python | Perpetual Enigma
Marc says:

July 11, 2016 at 5:45 PM

export SCALA_HOME=/usr/local/src/scala/scala-2.10.4

should read

export SCALA_HOME=/usr/local/src/scala/scala-2.10.6

1. Prateek Joshi says:
  
  July 11, 2016 at 5:47 PM
  
  Thanks for pointing it out. Fixed it!

How To Install Apache Spark On Ubuntu

Published by Prateek Joshi

3 thoughts on “How To Install Apache Spark On Ubuntu”

Leave a reply to Prateek Joshi Cancel reply

Share this:

Related

Published by Prateek Joshi

3 thoughts on “How To Install Apache Spark On Ubuntu”

Leave a reply to Prateek Joshi Cancel reply