Apache Spark 1.4.0 on Raspberry PI 2 Cluster

This is the second post of a series about Raspberry PI 2 BigData Cluster for OSX.

  1. Apache Spark on Raspberry Pi 2
  2. Apache Spark 1.4.0 on Raspberry PI 2 Cluster
  3. One Step Spark/Hadoop Installer for OSX v0.1.0
  4. Build Hadoop Cluster with 5 clicks


In the previous post, I’ve shown you a RPI2 cluster with Apache Spark 1.1.1 that has run for three months. Since Apache Spark 1.4.0 is out a few days ago, I’ve just upgraded the cluster.

Apache Spark 1.4.0 comes with SparkR finally. R has such a strong position in DataScience field that it is no surprise R and Spark merge into one. Among the many benefits this integration brings, DataFrame, the primary data structure for data processing in R, is ranked on the top.  This is such a great news that one can expect higher level of R algorithms eventually appear in SparkR. You can read more technical detail in AMPlab’s post

There are also other features and improvements coming in together such as early result of project Tungsten, prettier job monitoring, and numerous bug fixes.

Spark Web Console
Spark 1.4.0

Since it takes time to collect all the bits from various places, I’ve compiled an RPI image for you below. Also, this image comes with extra goodies; Numpy and Scikit-Learn. The two giant pillars in Python DataScience land, and they usually take a few hours each to compile into RPI2. Here, all compiled and cleanly installed.

Following is the summary of installed items

  1. Scala 2.11.6
  2. Hadoop 2.6.0
  3. Spark 1.4.0
  4. Numpy 1.9.2
  5. Scipy 0.15.1
  6. Scikit-Learn 0.16

Download RPI2 node image : 2015-06-21-rpi-spark140.img.7z

*  [2015-11-08] A new raspberry image will be uploaded. old image is removed for now.


E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join my channel!