This is the second post of a series about Raspberry PI 2 BigData Cluster for OSX.
- Apache Spark on Raspberry Pi 2
- Apache Spark 1.4.0 on Raspberry PI 2 Cluster
- One Step Spark/Hadoop Installer for OSX v0.1.0
- Build Hadoop Cluster with 5 clicks
Apache Spark 1.4.0 comes with SparkR finally. R has such a strong position in DataScience field that it is no surprise R and Spark merge into one. Among the many benefits this integration brings, DataFrame, the primary data structure for data processing in R, is ranked on the top. This is such a great news that one can expect higher level of R algorithms eventually appear in SparkR. You can read more technical detail in AMPlab’s post.
There are also other features and improvements coming in together such as early result of project Tungsten, prettier job monitoring, and numerous bug fixes.
Since it takes time to collect all the bits from various places, I’ve compiled an RPI image for you below. Also, this image comes with extra goodies; Numpy and Scikit-Learn. The two giant pillars in Python DataScience land, and they usually take a few hours each to compile into RPI2. Here, all compiled and cleanly installed.
Following is the summary of installed items
- Scala 2.11.6
- Hadoop 2.6.0
- Spark 1.4.0
- Numpy 1.9.2
- Scipy 0.15.1
- Scikit-Learn 0.16
Download RPI2 node image :
* [2015-11-08] A new raspberry image will be uploaded. old image is removed for now.
Subscribe for upcoming posts!
Join my channel!