Weekly BigData & ML Roundup – Aug. 17, 2017


Effective TensorFlow
TensorFlow tutorials and best practices

CommandCenter: StarCraft 2 AI Bot
CommandCenter is a StarCraft II playing bot written in C++ using Blizzard’s StarCraft II AI API


A python tool for evaluating the quality of sentence embeddings.

PyEcharts is a library to generate charts using Echarts. It simply provides the interface between Echarts and Python.

Tensor Bridge
Tensor Bridge – OpenAPI spec and REST wrapper around TensorFlow Serving

Jupyter C Kernel
Minimal Jupyter C kernel

Karura enables you to use machine learning interactively

A TensorBoard plugin for visualizing arbitrary tensors in a video as your network trains.


MLlib Convolutional and Feedforward Neural Network implementation with a high level API and advanced optimizers.

A hardware-accelerated deep learning library for the web.

A library of tools to train and run neural networks for computer vision tasks using Chainer.

A deep reinforcement learning library built on top of Chainer.

Implementation of Restricted Boltzmann Machine (RBM) and its variants in Tensorflow


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Apr. 27, 2017


Reinforcement Learning
Minimal and Clean Reinforcement Learning Examples

Snowplow with Kafka
Example showing Snowplow Tracker and Collector writing to Kafka and being consumed from there

REST API for Text Summarization and Keywords Extraction

An artificial intelligence written entirely in JavaScript that recognises the font of a text in a image using the Tesseract optical character recognition engine and some image processing libraries

How to learn AI / Deep learning / Machine Learning
A practical, top-down approach, starting with high-level frameworks to increasingly difficult problems, beginning with test problems with clean datasets and the move towards real-world problems

Awesome Machine Learning with Ruby
Minimal and Clean Reinforcement Learning Examples

Evaluation of Deep Learning Toolkits
This research was done in late 2015 with slight modifications in early 2016. Many toolkits have improved significantly since then


This is a bunch of code to port Keras neural network model into pure C++

Streamlining phylogenomic data gathering, processing and visualization

Desktop notebook app + packages


CycleGAN Models
Models generated by CycleGAN

Autosklearn Zeroconf
A fully automated binary classifier based on the AutoML challenge winner auto-sklearn

Exploration of methods for coloring t-SNE.


Module for automatic summarization of text documents and HTML pages.

fastText Multilingual
Multilingual word vectors in 78 languages

1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Nov. 17, 2016

We have is pocket-full of goodies today! Airbnb Caraval is renamed to superset. Also, we have MLAlgorithm which offers a rare opportunity to “learn internals of ml algorithms or implement them from scratch” with simpler, easier codebase.


Investing S&P500
Investing Returns on the Market as a Whole


Interactive convnet features visualization for Keras

A set of useful perceptually uniform colormaps for plotting scientific data

Data Science Utils
Some wrappers around python modules for simplifying the data exploration process.


Place Recognition Using Autoencoders & NN
Place recognition with WiFi fingerprints using Autoencoders and Neural Networks

Neural Cryptography
Neural Networks that invent their own encryption

Miles Deep – AI Porn Video Editor
Deep Learning Porn Video Classifier/Editor with Caffe


Minimal and clean examples of machine learning algorithms

Airbnb Superset
Superset is a data exploration platform designed to be visual, intuitive, and interactive

Uber Deck.gl
WebGL based visualization layers

Statistics utilities for the JVM – in Scala!

*https://pocketcluster.wordpress.com will move to https://blog.pocketcluster.io on Jan 1, 2017. If you have subscribed this blog, please make sure to change the feed address.

Looking into adding your repo? Any suggestion? Comment? Send your feedback to stkim1@pocketcluster.io, or tweet to @stkim1!

Looking for more BigData or Machine Learning repositories? You can find a lot more tools, frameworks and libraries at PocketCluster Index.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Apache Spark 1.5.2 on Raspberry PI 2 cluster

This is the second post of a series about BigData cluster for OS X and Raspberry PI 2.

  1. Build Hadoop Cluster with 5 clicks.
  2. Apache Spark 1.5.2 on Raspberry PI 2 cluster


Once we have a foundation to build Big Data analytics stack, the next one that should come is Apache Spark. Spark provides speed boost to MapReduce based algorithms by executing such computations in memory and conducting DAG optimization. You can read more details in this paper.

Spark tremendously helps analytics computation since such computations are iterative in nature. Suppose you’re to read and write 100 GB of data again and agin on disk and how painfully slow that could be. (Some folks handle a couple petabytes everyday. Let’s not go there yet.) At the same time, Spark handles such operations inside memory. You just ought to experience the differences. If you’re to do anything with Big Data analytics, Spark is therefore just one thing you are to encounter any direction you go.

In fact, the very first two posts of this blog are about running Spark on Raspberry PI 2 (henceforth RPI2). Nevertheless, it wasn’t much of joy to build and run such thing. If you’ve been with me, you know we’ve crossed some serious creeks. Now, here comes an OS X application that deploys Apache Spark & Hadoop with few mouse clicks.


PocketCluster 0.1.3

Just like the previous post, I’m going to play a video and talk about few more details.


First of all, PocketCluster supports Vagrant and Raspberry PI 2 at the same time. If you want to carry a multi-node Big Data environment with you all the time, it is definitely recommended to go with Vagrant version. The installation and operation process is exactly the same as the one depicted in the video.

Meanwhile, I would recommend to go with RPI2 if you’re working in a stationary environment. Six RPI2 could provide roughly the same amount of computation power as one Intel i7 processor does. You’d be able to 1) quickly test your hypothesis or 2) debug your prototype in a real, multi-node environment.

(*Old generation Raspberry PI is not supported. Only Raspberry PI 2 is supported at the moment.)

Secondly, In order for PocketCluster to smoothly install and operate, you’d need a solid internet connection. All the software are downloaded and configured in run-time that jumpy connection could really ruin your experience. I am working on improving this one.

Third, Spark supports five different modes of operation. 1) Standalone 2) Pseudo cluster, 3) Standalone cluster, 4) YARN Client, and 5) Mesos client. PocketCluster supports Standalone cluster mode only at the moment.

Lastly, as soon as Spark installation completes, SparkR is configured run across slave nodes. It just that Homebrew installation of R takes forever since it needs to compile gcc to provide Fortran for R. Hence, should you need to use SparkR, just type following in a Terminal or iTerm shell on your Mac after installation. (*it could take about 40 mins.)

brew tap homebrew/science && brew install r && brew untap homebrew/science

For more detailed instructions about PocketCluster installation, please go to my previous post.

Here comes  PocketCluster 0.1.3  again. I’m looking for the next package to install. If you have a suggestion, please leave a comment below or tweet me @stkim1.


E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join my channel!

Build Hadoop Cluster with 5 clicks

This is the first post of a new series about BigData cluster for OSX and Raspberry PI 2.

  1. Build Hadoop Cluster with 5 clicks.
  2. Apache Spark 1.5.2 on Raspberry PI 2 cluster

*update: Dec 5, 2015. PocketCluster is updated to 0.1.3 for Apache Spark!

It was back in February of this year when the first set of Raspberry PI 2 (henceforth RPI2) cluster was built and Apache Spark ran on it. Back then, it was much of an experiment and fun since only handful of folks did give a second thought to putting RPI in that sort of use.

After opening up RPI2 cluster case (part 1, part 2, open schematic), things have changed a tiny bit. Many had good laughter on the cluster, and some were enthusiastic about the possibilities the cluster had.

In fact, Makezine and MagPi (pg #28) have covered the case, and many have retweeted and downloaded the schematics. Nonetheless, one question rang so strong and motivated me to finalize what I started back in July.

The question indicates he seems to believe there is no clear ways to put the power of multiple RPI2 in good use. I, for one, was a bit shocked since there was indeed a good use case in front of me. If I may be so daring, I’d like to change the viewpoint just a bit by presenting a dead-simple way to build a BigData cluster on a MacBook with RPI2s for experiments and experiences.

Let’s firstly go through existing solutions. We don’t want to reinvent wheel. One can easily point that we already have Cloudera, Hortonworks, MapR, and Apache Ambari to automate the tedious setup process.

I would agree on a specific domain. Say we are working on multiple racks of powerful datacenter grade nodes after nodes, and we need an enterprise level support. Then they are definitely the answer. I would encourage to click the link. They are, however, designed to work on powerful machines, not like a MacBook or RPI2s.

Another can give us multiple links of how to install Hadoop and Spark with 5 seconds of google search. Here is actually one you can try on RPI2. (Thank you Jonas!) It wouldn’t be much of fun if you’d have to open up lots of text files after text files on RPI2 after RPI2.

Shouldn’t there be something that just works, is lightweight, and gets the tedious installation process out of your way when you just want to play with a BigData cluster on a MacBook and RPI2s?

Screen Shot 2015-10-21 at 8.37.46 PM

Here comes PocketCluster (ver 0.1.3).

It builds you a Hadoop Cluster with 3 slave nodes for Vagrant/VirtualBox, and up to 6 slave nodes for Raspberry PI 2 within 5 mouse clicks. (Yes, PocketCluster builds you Hadoop cluster on two different platforms.) It takes no command line configuration to install and run.

Seeing is believing. Watch the videos below and chill with me.

Vagrant Cluster (3-Nodes)

Vagrant Cluster is designed to work without a single Raspberry PI 2. PocketCluster will create 3 slave nodes based on Vagrant + VirtualBox, and utilize OSX as master. This variation is there for you who have a MacBook and want to carry a multi-nodes environment all the time. By the way, you need at least 3 GB of Memory and 9 GB of free disk space. (A Mac with at least 8 GB of Memory is recommended.)

Install the following pre-requisites first, and make sure remote login service is enabled. Copy PocketCluster into Application folder. Then you are all set to go.

  1. Java 1.8
  2. Homebrew
  3. VirtualBox 5.0.10
  4. Vagrant 1.7.4

All the requirements are commonly used, and many have already installed on their Macs. I strongly recommend you update your installation to the latest version. Actual installation could take up to 15 minutes. It mostly takes time to download files so make sure your internet connection is solid. 😉

Raspbery PI 2 Cluster (Up to 6-Nodes)

Here come the fun part. RPI2 cluster does not require Vagrant/VirtualBox since it uses *real* nodes, which execute distributed jobs and store data. PocketCluster will setup up to 6 Raspberry PI 2 as slave nodes, and use your Mac as a master. Whatever challenges you’d experience in this environment, you will encounter in a datacenter, a cloud, or/and an in-house cluster.

(*Old generation Raspberry PI is not supported. Only Raspberry PI 2 is supported at this point.)

Operating RPI2 cluster does not need as much memory or disk space as Vagrant cluster. I’d say it is safe to operate the cluster on a Mac with 4 GB memory configuration.

RPI2 cluster requires an Ethernet connection. (It does not work with WIFI.) If PocketCluster runs on a Macbook, you’d need one of the goodies below. (iMac, Mac mini, and Mac Pro do not need an adapter.)

Make sure all the RPI2s are behind the same router as the Mac is, and remote login service is enabled just for you.

You firstly need to download the RPI2 ubuntu image below, and bake the image into an SD Card (at least 8 GB SD Card recommended). Everything you’d need is already installed and configured. Slide the baked SD cards into RPI2 slots, and power up RPI2s.


Then, install following pre-requisites, and copy PocketCluster into Application folder. You are good to go then.

  1. Java 1.8
  2. Homebrew

Actual installation could take up to 15 minutes or longer depending on your internet connection. Make sure your internet connection is solid.

Here comes the download page again. PocketCluster (ver 0.1.3).

This is indeed a continuation from my previous post. The points of having a Big Data cluster on your Mac could boil down to followings.

  • Friendlier Environment.
  • Experiments and Experiences.
  • Multi-nodes cluster.

Vagrant cluster is more suited when you literally want to carry a cluster around in your pocket. On the other hand, RPI2 cluster will give you a real-life environment and challenges.

Since this is barely version 0.1.2, there are lots of improvements to be made, and more BigData software packages to be added. (Apache Spark is planned to be added next time!) If you have another suggestion or question, please leave a comment below. You can also tweet me @stkim1.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join my channel!

[Free Schematic] Raspberry PI 2 Cluster Assembly Tutorial

This is the third post of a series about Raspberry PI 2 BigData cluster case.

  1. Raspberry PI 2 Cluster Case pt1
  2. Raspberry PI 2 Cluster Case pt2
  3. Raspberry PI 2 Cluster Assembly Tutorial
  4. Build Hadoop Cluster with 5 clicks


Since my last post, Raspberry PI 2 Cluster Case pt2, readers have asked if I could open the schematics of the cluster case with detailed assembly instruction. So, here comes the schematics under Solderpad Hardware License. The license is equivalent to Apache, meaning you can make and distribute for free of royalty. I would encourage you to contribute in forms of improvements and extensions though.


1. Boards


Let’s firstly prepare boards. Download the schematics. Please cut out one piece (1) of each and two pieces (2) of board-middle. You can use wood, synthetic, or Acrylic panel to cut out the shapes above. Whatever the material it is, you can use it as long as a panel is no thicker than 3 mm and strong enough to hold a USB charger.


If you cut out a board from Acrylic panel, they will come with protective films on. Peel them first.

Two pieces of board-middle and one piece of each.

2. Power and Cables


  1. One Photive 6-ports 50 watt USB charger.
  2. Six 1ft ( 30 cm ) Cat6 Ethernet cables.
  3. Six 1ft ( 30 cm ) 90 Degree right-angled Micro USB to USB cable.


Take a look at the cable. A Micro-USB-to-USB cable must be right-angled, not left-angled. You may also want to check the thickness of a wire, and make sure your cable is thick enough to deliver enough power. The one available to me comes with AWG 26 cable. Several different brands are available, an examples, at Amazon, Ebay, and Alibaba.

3. Screws

You then need screws. Buying right-sized screws are the hardest part of the whole process, at least for me. Here’s the list of them.


  1. 2 x M3 Hex Nuts.
  2. 22 x M3 25mm Pillar Screws.
  3. 4 x M3 Whirled Hex Nuts. (could be replaced with four of #1 plain M3 nuts.)
  4. 6 x M3 4mm Screws.
  5. 24 x M2.5 5mm Screws.
  6. 24 x M2.5 Hex Nuts
  7. 24 x M2.5 5mm Pillar Screws.

The four supporting holes in Raspberry PI 2 are 2.7 mm in diameter, and, ideally, you would want to use M2.6 screws and nuts for fixing Raspberry Pi 2 since they fit rather tight. The problem was M2.6 Pillar Screws did not come easy, and I had to custom order M2.6 Pillar Screws. 😦

The ones linked in here are therefore M2.5 screws and nuts. They are available off-the-shelf and smaller in diameter than holes on Raspberry Pi that it would not cause much issue. I hope these holes get bigger in the future so that I could simply use good-old plain M3 screws.

4. Raspberry PI 2 B+


  1. 6 x Raspberry PI 2 B+

We do not need to say more here. 🙂 Let’s call them RPI henceforth.

5. Network Switch (Optional)

In case you want to fit a network switch to in the cluster, pick something slim. I picked up one below.

  1. D-Link 8-Port Gigabit Desktop Switch

One thing to remember is that you have a choice not to put a network switch at all. You can simply put Raspberry PI only, and connect them to an existing network. This switch is just there to provide the cluster’s own network.


Once you have the components and materials, it’s rather straightforward. Firstly put M2.5 5mm Pillar Screws on boards and tighten ’em with M2.5 Hex nuts like below. You’d have to do this for 6 sets.



Then fix RPI on the boards with M2.5 Screws.


Once you’re done, you’d have six sets. You’d see RPIs on right side are all flipped. In this way, all RPIs’ power inputs are placed right next to USB charger reducing the cluster volume. The big round holes you’d also see are for ventilation.


Apply two pieces of double-sided tape on a USB charger supporter. This is to give a bit more strength to the board to support the charger.


Place it on the board named board-middle-end like below.



Then it is time to stack up RPIs. Look carefully how each piece is stacked.



Once you complete stacking up RPIs, plug in Micro-USB-to-USB cables.



Once you plug in the cables, place your USB charger in the middle, and close the cluster with board-top. Don’t forget to tighten up with M3 4mm Screws.


Now plug in USB ends to the USB charger. See how the boards hold the charger, and clear the entrance for USB ends? 🙂


Place the network switch at the bottom layer and close it with board-bottom and four M3 Hex Nuts like below.


Then connect your RPIs and the network switch with Cat6 Ethernet Cables.


Once you supply power to the charger, all the RPIs will lit up!



This is how a 6-nodes RPI cluster is assembled. You can replace as many parts as you wish, and modify as much as you want. All these instructions and schematics are here for a template.

if you get to build one, please share yours with #rpicluster hashtag on twitter so that we all can see!!!  Now, here comes the schematics again, and let the fun begin!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join my channel!