The Next Version of PocketCluster

It has been quite some time since PocketCluster application and the Raspberry PI image disappeared from the download page, and many have asked when they would be available again.

Let’s rewind the clock a bit. The original version of PocketCluster was written to build a Apache Hadoop + Spark cluster with Raspberry PIs and a Mac. Back in late 2015 when Google TensorFlow became available to the public, it also became clear that PocketCluster needed to handle more than one cluster frameworks for its users to properly execute whatever task at their hands. For example, PocketCluster Index tracks 129 frameworks, let alone +1,200 libraries, toolsets, and models.

When installing Apache Spark with PocketCluster could take several hours in worst case, the new requirement on horizon was simply an over-stretching. Business-as-usual patchwork type upgrade here and there could resolve the issue only if there was a sounding foundation. Otherwise, the kind of duct-tape measure would not land PocketCluster nowhere near you would look for. It badly needed to be rebuilt with certain goals in mind, and that has been undertaken for some time.

Followings are the changes made so far. We can talk about the goals in a later post.

Same Simple Installation


Your Mac is the master node of your cluster, and Raspberry Pi (RPI) devices are slave nodes. Bake a provided image to SD cards and boot up RPIs. Drag and drop PocketCluster to Application folder. Then give a double-click; no black and white terminal, no command to copy & paste, no wall of text. This is 2017 after all.


All-In-One, Ready-made, Out-of-the-Box Package



Cluster frameworks such as Apache Spark or Hadoop come in a package that needs no extra configuration steps. Have quality time focusing on your main task, and let PocketCluster take care of all the rest of small tedious chores.

Drastically Reduced Installation Time

package-installWith the previous versions, installing a package used to take up-to several hours in some cases. It often failed completely. The new version will complete installing a package of Hadoop + Spark + Jupyter across your entire cluster within half an hour.

Secure Network Connection

Most of PocketCluster network connections are securely encrypted. Not only this is done out of necessity of protecting your cluster from malicious infiltration attempts, but also is to provide you a shielded environment. This leads to a possibility where there could exist multiple clusters in a workplace or home, but your cluster operates just for you and nobody else.

All 64 bit Kernel


PocketCluster runs RPI3 with 64bit kernel. The previous versions operated on 32bit kernel, and it significantly hampered the ability to handle data in large size. One might raise an argument that RPI3 only has 1GB of memory, and there is no point of deploying such memory hungry kernel.

Shifting kernel surely comes with a plan. Besides RPI3, three more single board computer models at about the same price range have been evaluated, and one or two will be added to supported device category. They will have significantly more memory and I/O capacity to surely enhance your experience with PocketCluster.

Few words on missing regular update


It is highly unlikely that posting regular update on the progress takes the lowest priority. While it is indeed exactly the opposite, the rebuilding progress so far strongly resembles a job where you are to repair a road with thousands of small but deep crannies. You are to fill them all up, and make it smooth enough for cars to fly through without drivers feeling any bump. There have been more than many moments when cracks suddenly go down way deeper than what is foreseen. It was rather difficult to make updates for those moments, and weekly round-ups in the past were substitutes for progress updates like a life sign pulse.

Even at this point, there exist many corners literally taken off to meet a tentative timeline to release the new version before Christmas of this year. As there lies a strong and sounding foundation, however, the experience with PocketCluster is scheduled to enhance accordingly, and all those corners will be revisited and reconstituted eventually.

Thank you very much for keeping your interest in PocketCluster, and stay tuned.

1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Aug. 17, 2017


Effective TensorFlow
TensorFlow tutorials and best practices

CommandCenter: StarCraft 2 AI Bot
CommandCenter is a StarCraft II playing bot written in C++ using Blizzard’s StarCraft II AI API


A python tool for evaluating the quality of sentence embeddings.

PyEcharts is a library to generate charts using Echarts. It simply provides the interface between Echarts and Python.

Tensor Bridge
Tensor Bridge – OpenAPI spec and REST wrapper around TensorFlow Serving

Jupyter C Kernel
Minimal Jupyter C kernel

Karura enables you to use machine learning interactively

A TensorBoard plugin for visualizing arbitrary tensors in a video as your network trains.


MLlib Convolutional and Feedforward Neural Network implementation with a high level API and advanced optimizers.

A hardware-accelerated deep learning library for the web.

A library of tools to train and run neural networks for computer vision tasks using Chainer.

A deep reinforcement learning library built on top of Chainer.

Implementation of Restricted Boltzmann Machine (RBM) and its variants in Tensorflow


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Apr. 27, 2017


Reinforcement Learning
Minimal and Clean Reinforcement Learning Examples

Snowplow with Kafka
Example showing Snowplow Tracker and Collector writing to Kafka and being consumed from there

REST API for Text Summarization and Keywords Extraction

An artificial intelligence written entirely in JavaScript that recognises the font of a text in a image using the Tesseract optical character recognition engine and some image processing libraries

How to learn AI / Deep learning / Machine Learning
A practical, top-down approach, starting with high-level frameworks to increasingly difficult problems, beginning with test problems with clean datasets and the move towards real-world problems

Awesome Machine Learning with Ruby
Minimal and Clean Reinforcement Learning Examples

Evaluation of Deep Learning Toolkits
This research was done in late 2015 with slight modifications in early 2016. Many toolkits have improved significantly since then


This is a bunch of code to port Keras neural network model into pure C++

Streamlining phylogenomic data gathering, processing and visualization

Desktop notebook app + packages


CycleGAN Models
Models generated by CycleGAN

Autosklearn Zeroconf
A fully automated binary classifier based on the AutoML challenge winner auto-sklearn

Exploration of methods for coloring t-SNE.


Module for automatic summarization of text documents and HTML pages.

fastText Multilingual
Multilingual word vectors in 78 languages

1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Nov. 17, 2016

We have is pocket-full of goodies today! Airbnb Caraval is renamed to superset. Also, we have MLAlgorithm which offers a rare opportunity to “learn internals of ml algorithms or implement them from scratch” with simpler, easier codebase.


Investing S&P500
Investing Returns on the Market as a Whole


Interactive convnet features visualization for Keras

A set of useful perceptually uniform colormaps for plotting scientific data

Data Science Utils
Some wrappers around python modules for simplifying the data exploration process.


Place Recognition Using Autoencoders & NN
Place recognition with WiFi fingerprints using Autoencoders and Neural Networks

Neural Cryptography
Neural Networks that invent their own encryption

Miles Deep – AI Porn Video Editor
Deep Learning Porn Video Classifier/Editor with Caffe


Minimal and clean examples of machine learning algorithms

Airbnb Superset
Superset is a data exploration platform designed to be visual, intuitive, and interactive

WebGL based visualization layers

Statistics utilities for the JVM – in Scala!

* will move to on Jan 1, 2017. If you have subscribed this blog, please make sure to change the feed address.

Looking into adding your repo? Any suggestion? Comment? Send your feedback to, or tweet to @stkim1!

Looking for more BigData or Machine Learning repositories? You can find a lot more tools, frameworks and libraries at PocketCluster Index.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Apache Spark 1.5.2 on Raspberry PI 2 cluster

This is the second post of a series about BigData cluster for OS X and Raspberry PI 2.

  1. Build Hadoop Cluster with 5 clicks.
  2. Apache Spark 1.5.2 on Raspberry PI 2 cluster
  3. The Next Version of PocketCluster

** update: Oct 2, 2017. The next version of PocketCluster is coming soon!

Once we have a foundation to build Big Data analytics stack, the next one that should come is Apache Spark. Spark provides speed boost to MapReduce based algorithms by executing such computations in memory and conducting DAG optimization. You can read more details in this paper.

Spark tremendously helps analytics computation since such computations are iterative in nature. Suppose you’re to read and write 100 GB of data again and agin on disk and how painfully slow that could be. (Some folks handle a couple petabytes everyday. Let’s not go there yet.) At the same time, Spark handles such operations inside memory. You just ought to experience the differences. If you’re to do anything with Big Data analytics, Spark is therefore just one thing you are to encounter any direction you go.

In fact, the very first two posts of this blog are about running Spark on Raspberry PI 2 (henceforth RPI2). Nevertheless, it wasn’t much of joy to build and run such thing. If you’ve been with me, you know we’ve crossed some serious creeks. Now, here comes an OS X application that deploys Apache Spark & Hadoop with few mouse clicks.


PocketCluster 0.1.3

Just like the previous post, I’m going to play a video and talk about few more details.

First of all, PocketCluster supports Vagrant and Raspberry PI 2 at the same time. If you want to carry a multi-node Big Data environment with you all the time, it is definitely recommended to go with Vagrant version. The installation and operation process is exactly the same as the one depicted in the video.

Meanwhile, I would recommend to go with RPI2 if you’re working in a stationary environment. Six RPI2 could provide roughly the same amount of computation power as one Intel i7 processor does. You’d be able to 1) quickly test your hypothesis or 2) debug your prototype in a real, multi-node environment.

(*Old generation Raspberry PI is not supported. Only Raspberry PI 2 is supported at the moment.)

Secondly, In order for PocketCluster to smoothly install and operate, you’d need a solid internet connection. All the software are downloaded and configured in run-time that jumpy connection could really ruin your experience. I am working on improving this one.

Third, Spark supports five different modes of operation. 1) Standalone 2) Pseudo cluster, 3) Standalone cluster, 4) YARN Client, and 5) Mesos client. PocketCluster supports Standalone cluster mode only at the moment.

Lastly, as soon as Spark installation completes, SparkR is configured run across slave nodes. It just that Homebrew installation of R takes forever since it needs to compile gcc to provide Fortran for R. Hence, should you need to use SparkR, just type following in a Terminal or iTerm shell on your Mac after installation. (*it could take about 40 mins.)

brew tap homebrew/science && brew install r && brew untap homebrew/science

For more detailed instructions about PocketCluster installation, please go to my previous post.

Here comes  PocketCluster 0.1.3  again. I’m looking for the next package to install. If you have a suggestion, please leave a comment below or tweet me @stkim1.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join my channel!


Build Hadoop Cluster with 5 clicks

This is the first post of a new series about BigData cluster for OSX and Raspberry PI 2.

  1. Build Hadoop Cluster with 5 clicks.
  2. Apache Spark 1.5.2 on Raspberry PI 2 cluster
  3. The Next Version of PocketCluster

* update: Dec 5, 2015. PocketCluster is updated to 0.1.3 for Apache Spark!
** update: Oct 2, 2017. The next version of PocketCluster is coming soon!

It was back in February of this year when the first set of Raspberry PI 2 (henceforth RPI2) cluster was built and Apache Spark ran on it. Back then, it was much of an experiment and fun since only handful of folks did give a second thought to putting RPI in that sort of use.

After opening up RPI2 cluster case (part 1, part 2, open schematic), things have changed a tiny bit. Many had good laughter on the cluster, and some were enthusiastic about the possibilities the cluster had.

In fact, Makezine and MagPi (pg #28) have covered the case, and many have retweeted and downloaded the schematics. Nonetheless, one question rang so strong and motivated me to finalize what I started back in July.

The question indicates he seems to believe there is no clear ways to put the power of multiple RPI2 in good use. I, for one, was a bit shocked since there was indeed a good use case in front of me. If I may be so daring, I’d like to change the viewpoint just a bit by presenting a dead-simple way to build a BigData cluster on a MacBook with RPI2s for experiments and experiences.

Let’s firstly go through existing solutions. We don’t want to reinvent wheel. One can easily point that we already have Cloudera, Hortonworks, MapR, and Apache Ambari to automate the tedious setup process.

I would agree on a specific domain. Say we are working on multiple racks of powerful datacenter grade nodes after nodes, and we need an enterprise level support. Then they are definitely the answer. I would encourage to click the link. They are, however, designed to work on powerful machines, not like a MacBook or RPI2s.

Another can give us multiple links of how to install Hadoop and Spark with 5 seconds of google search. Here is actually one you can try on RPI2. (Thank you Jonas!) It wouldn’t be much of fun if you’d have to open up lots of text files after text files on RPI2 after RPI2.

Shouldn’t there be something that just works, is lightweight, and gets the tedious installation process out of your way when you just want to play with a BigData cluster on a MacBook and RPI2s?

Screen Shot 2015-10-21 at 8.37.46 PM

Here comes PocketCluster (ver 0.1.3).

It builds you a Hadoop Cluster with 3 slave nodes for Vagrant/VirtualBox, and up to 6 slave nodes for Raspberry PI 2 within 5 mouse clicks. (Yes, PocketCluster builds you Hadoop cluster on two different platforms.) It takes no command line configuration to install and run.

Seeing is believing. Watch the videos below and chill with me.

Vagrant Cluster (3-Nodes)

Vagrant Cluster is designed to work without a single Raspberry PI 2. PocketCluster will create 3 slave nodes based on Vagrant + VirtualBox, and utilize OSX as master. This variation is there for you who have a MacBook and want to carry a multi-nodes environment all the time. By the way, you need at least 3 GB of Memory and 9 GB of free disk space. (A Mac with at least 8 GB of Memory is recommended.)

Install the following pre-requisites first, and make sure remote login service is enabled. Copy PocketCluster into Application folder. Then you are all set to go.

  1. Java 1.8
  2. Homebrew
  3. VirtualBox 5.0.10
  4. Vagrant 1.7.4

All the requirements are commonly used, and many have already installed on their Macs. I strongly recommend you update your installation to the latest version. Actual installation could take up to 15 minutes. It mostly takes time to download files so make sure your internet connection is solid. 😉

Raspbery PI 2 Cluster (Up to 6-Nodes)

Here come the fun part. RPI2 cluster does not require Vagrant/VirtualBox since it uses *real* nodes, which execute distributed jobs and store data. PocketCluster will setup up to 6 Raspberry PI 2 as slave nodes, and use your Mac as a master. Whatever challenges you’d experience in this environment, you will encounter in a datacenter, a cloud, or/and an in-house cluster.

(*Old generation Raspberry PI is not supported. Only Raspberry PI 2 is supported at this point.)

Operating RPI2 cluster does not need as much memory or disk space as Vagrant cluster. I’d say it is safe to operate the cluster on a Mac with 4 GB memory configuration.

RPI2 cluster requires an Ethernet connection. (It does not work with WIFI.) If PocketCluster runs on a Macbook, you’d need one of the goodies below. (iMac, Mac mini, and Mac Pro do not need an adapter.)

Make sure all the RPI2s are behind the same router as the Mac is, and remote login service is enabled just for you.

You firstly need to download the RPI2 ubuntu image below, and bake the image into an SD Card (at least 8 GB SD Card recommended). Everything you’d need is already installed and configured. Slide the baked SD cards into RPI2 slots, and power up RPI2s.


Then, install following pre-requisites, and copy PocketCluster into Application folder. You are good to go then.

  1. Java 1.8
  2. Homebrew

Actual installation could take up to 15 minutes or longer depending on your internet connection. Make sure your internet connection is solid.

Here comes the download page again. PocketCluster (ver 0.1.3).

This is indeed a continuation from my previous post. The points of having a Big Data cluster on your Mac could boil down to followings.

  • Friendlier Environment.
  • Experiments and Experiences.
  • Multi-nodes cluster.

Vagrant cluster is more suited when you literally want to carry a cluster around in your pocket. On the other hand, RPI2 cluster will give you a real-life environment and challenges.

Since this is barely version 0.1.2, there are lots of improvements to be made, and more BigData software packages to be added. (Apache Spark is planned to be added next time!) If you have another suggestion or question, please leave a comment below. You can also tweet me @stkim1.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join my channel!