Apache Spark 1.4.0 on Raspberry PI 2 Cluster

This is the second post of a series about Raspberry PI 2 BigData Cluster for OSX.

  1. Apache Spark on Raspberry Pi 2
  2. Apache Spark 1.4.0 on Raspberry PI 2 Cluster
  3. One Step Spark/Hadoop Installer for OSX v0.1.0
  4. Build Hadoop Cluster with 5 clicks

 

In the previous post, I’ve shown you a RPI2 cluster with Apache Spark 1.1.1 that has run for three months. Since Apache Spark 1.4.0 is out a few days ago, I’ve just upgraded the cluster.

Apache Spark 1.4.0 comes with SparkR finally. R has such a strong position in DataScience field that it is no surprise R and Spark merge into one. Among the many benefits this integration brings, DataFrame, the primary data structure for data processing in R, is ranked on the top.  This is such a great news that one can expect higher level of R algorithms eventually appear in SparkR. You can read more technical detail in AMPlab’s post

There are also other features and improvements coming in together such as early result of project Tungsten, prettier job monitoring, and numerous bug fixes.

Spark Web Console
Spark 1.4.0

Since it takes time to collect all the bits from various places, I’ve compiled an RPI image for you below. Also, this image comes with extra goodies; Numpy and Scikit-Learn. The two giant pillars in Python DataScience land, and they usually take a few hours each to compile into RPI2. Here, all compiled and cleanly installed.

Following is the summary of installed items

  1. Scala 2.11.6
  2. Hadoop 2.6.0
  3. Spark 1.4.0
  4. Numpy 1.9.2
  5. Scipy 0.15.1
  6. Scikit-Learn 0.16

Download RPI2 node image : 2015-06-21-rpi-spark140.img.7z

*  [2015-11-08] A new raspberry image will be uploaded. old image is removed for now.

 

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join my channel!

Apache Spark on Raspberry Pi 2

This is the first post of a series about Raspberry PI 2 BigData Cluster for OSX.

  1. Apache Spark on Raspberry Pi 2
  2. Apache Spark 1.4.0 on Raspberry PI 2 Cluster
  3. One Step Spark/Hadoop Installer for OSX v0.1.0
  4. Build Hadoop Cluster with 5 clicks

 

Courtesy of RPI Foundation
Courtesy of RPI Foundation

I think Raspberry PI is a great platform to experience how a small computer could bring difference to every little corner. Here comes my own example. I am a programmer, and, with the recent frenzy of data science, I put my hands on Apache Spark on Hadoop. The thing about the combination is it needs a cluster of computers, at least three, to find out how it works.

There are basically two options; cloud or my own. Long story short (this could be lengthy. For example, look here), none really seems to fully address my need. Then it was last Feb. when a news hit me. Windows 10 on Raspberry Pi 2! It must run Spark!, I thought.

After running Spark/Hadoop on a cluster of six Raspberry Pi 2 for about three months so far, I can say my hunch paid me.

Word Count on RPI cluster
WordCount on RPI2 cluster
Spark WebUI
Spark WebUI
RPI2 Spark cluster running
RPI2 Spark cluster running

Finally, here comes the RPI2 image with following items

  • Scala 2.11
  • Hadoop 2.6
  • Spark 1.3.1

Download RPI2 Node Image : 2015-05-31_rpi-spark.img.7z

I will continue on how to setup a Raspberry PI 2 next time.

*  [2015-11-08] A new raspberry image will be uploaded. old image is removed for now.

 

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join my channel!