Weekly Machine Learning Opensource Roundup – Feb. 22, 2018


A guide for High School students to learning Machine Learning and AI
A learning path in Machine Learning and Artificial Intelligence for High School students

NLP concepts with spaCy
The aim of this notebook is to introduce a few simple concepts and techniques from NLP – just the stuff that’ll help you do creative things quickly

A critical reading list for engineers, designers, and policy makers
Toward ethical, transparent and fair AI/ML: a critical reading list for engineers, designers, and policy makers

Dynamic Neural Manifold
A neural network architecture with a static execution graph that acts as a dynamic neural network in which connections between various neurons are controlled by the network itself.


Python pretty printer for matrices and column vectors.


An implementation of Nvidia’s fast photorealistic style transfer algorithm. Given a content photo and a style photo, the code can transfer the style of the style photo to the content photo.

Efficient Neural Architecture Search (ENAS) in PyTorch
PyTorch implementation of “Efficient Neural Architecture Search via Parameters Sharing”

Neural Phrase-based Machine Translation
NPMT explicitly models the phrase structures in output sequences using Sleep-WAke Networks (SWAN), a recently proposed segmentation-based sequence modeling method


A Python framework for sequence labeling evaluation (named-entity recognition, pos tagging, etc…)

Pytorch CNN Finetune
Fine-tune pre-trained Convolutional Neural Networks with PyTorch

Like to add your project? tweet @stkim1!

The Next Version of PocketCluster

It has been quite some time since PocketCluster application and the Raspberry PI image disappeared from the download page, and many have asked when they would be available again.

Let’s rewind the clock a bit. The original version of PocketCluster was written to build a Apache Hadoop + Spark cluster with Raspberry PIs and a Mac. Back in late 2015 when Google TensorFlow became available to the public, it also became clear that PocketCluster needed to handle more than one cluster frameworks for its users to properly execute whatever task at their hands. For example, PocketCluster Index tracks 129 frameworks, let alone +1,200 libraries, toolsets, and models.

When installing Apache Spark with PocketCluster could take several hours in worst case, the new requirement on horizon was simply an over-stretching. Business-as-usual patchwork type upgrade here and there could resolve the issue only if there was a sounding foundation. Otherwise, the kind of duct-tape measure would not land PocketCluster nowhere near you would look for. It badly needed to be rebuilt with certain goals in mind, and that has been undertaken for some time.

Followings are the changes made so far. We can talk about the goals in a later post.

Same Simple Installation


Your Mac is the master node of your cluster, and Raspberry Pi (RPI) devices are slave nodes. Bake a provided image to SD cards and boot up RPIs. Drag and drop PocketCluster to Application folder. Then give a double-click; no black and white terminal, no command to copy & paste, no wall of text. This is 2017 after all.


All-In-One, Ready-made, Out-of-the-Box Package



Cluster frameworks such as Apache Spark or Hadoop come in a package that needs no extra configuration steps. Have quality time focusing on your main task, and let PocketCluster take care of all the rest of small tedious chores.

Drastically Reduced Installation Time

package-installWith the previous versions, installing a package used to take up-to several hours in some cases. It often failed completely. The new version will complete installing a package of Hadoop + Spark + Jupyter across your entire cluster within half an hour.

Secure Network Connection

Most of PocketCluster network connections are securely encrypted. Not only this is done out of necessity of protecting your cluster from malicious infiltration attempts, but also is to provide you a shielded environment. This leads to a possibility where there could exist multiple clusters in a workplace or home, but your cluster operates just for you and nobody else.

All 64 bit Kernel


PocketCluster runs RPI3 with 64bit kernel. The previous versions operated on 32bit kernel, and it significantly hampered the ability to handle data in large size. One might raise an argument that RPI3 only has 1GB of memory, and there is no point of deploying such memory hungry kernel.

Shifting kernel surely comes with a plan. Besides RPI3, three more single board computer models at about the same price range have been evaluated, and one or two will be added to supported device category. They will have significantly more memory and I/O capacity to surely enhance your experience with PocketCluster.

Few words on missing regular update


It is highly unlikely that posting regular update on the progress takes the lowest priority. While it is indeed exactly the opposite, the rebuilding progress so far strongly resembles a job where you are to repair a road with thousands of small but deep crannies. You are to fill them all up, and make it smooth enough for cars to fly through without drivers feeling any bump. There have been more than many moments when cracks suddenly go down way deeper than what is foreseen. It was rather difficult to make updates for those moments, and weekly round-ups in the past were substitutes for progress updates like a life sign pulse.

Even at this point, there exist many corners literally taken off to meet a tentative timeline to release the new version before Christmas of this year. As there lies a strong and sounding foundation, however, the experience with PocketCluster is scheduled to enhance accordingly, and all those corners will be revisited and reconstituted eventually.

Thank you very much for keeping your interest in PocketCluster, and stay tuned.

1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Aug. 17, 2017


Effective TensorFlow
TensorFlow tutorials and best practices

CommandCenter: StarCraft 2 AI Bot
CommandCenter is a StarCraft II playing bot written in C++ using Blizzard’s StarCraft II AI API


A python tool for evaluating the quality of sentence embeddings.

PyEcharts is a library to generate charts using Echarts. It simply provides the interface between Echarts and Python.

Tensor Bridge
Tensor Bridge – OpenAPI spec and REST wrapper around TensorFlow Serving

Jupyter C Kernel
Minimal Jupyter C kernel

Karura enables you to use machine learning interactively

A TensorBoard plugin for visualizing arbitrary tensors in a video as your network trains.


MLlib Convolutional and Feedforward Neural Network implementation with a high level API and advanced optimizers.

A hardware-accelerated deep learning library for the web.

A library of tools to train and run neural networks for computer vision tasks using Chainer.

A deep reinforcement learning library built on top of Chainer.

Implementation of Restricted Boltzmann Machine (RBM) and its variants in Tensorflow


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Apr. 27, 2017


Reinforcement Learning
Minimal and Clean Reinforcement Learning Examples

Snowplow with Kafka
Example showing Snowplow Tracker and Collector writing to Kafka and being consumed from there

REST API for Text Summarization and Keywords Extraction

An artificial intelligence written entirely in JavaScript that recognises the font of a text in a image using the Tesseract optical character recognition engine and some image processing libraries

How to learn AI / Deep learning / Machine Learning
A practical, top-down approach, starting with high-level frameworks to increasingly difficult problems, beginning with test problems with clean datasets and the move towards real-world problems

Awesome Machine Learning with Ruby
Minimal and Clean Reinforcement Learning Examples

Evaluation of Deep Learning Toolkits
This research was done in late 2015 with slight modifications in early 2016. Many toolkits have improved significantly since then


This is a bunch of code to port Keras neural network model into pure C++

Streamlining phylogenomic data gathering, processing and visualization

Desktop notebook app + packages


CycleGAN Models
Models generated by CycleGAN

Autosklearn Zeroconf
A fully automated binary classifier based on the AutoML challenge winner auto-sklearn

Exploration of methods for coloring t-SNE.


Module for automatic summarization of text documents and HTML pages.

fastText Multilingual
Multilingual word vectors in 78 languages

1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Nov. 17, 2016

We have is pocket-full of goodies today! Airbnb Caraval is renamed to superset. Also, we have MLAlgorithm which offers a rare opportunity to “learn internals of ml algorithms or implement them from scratch” with simpler, easier codebase.


Investing S&P500
Investing Returns on the Market as a Whole


Interactive convnet features visualization for Keras

A set of useful perceptually uniform colormaps for plotting scientific data

Data Science Utils
Some wrappers around python modules for simplifying the data exploration process.


Place Recognition Using Autoencoders & NN
Place recognition with WiFi fingerprints using Autoencoders and Neural Networks

Neural Cryptography
Neural Networks that invent their own encryption

Miles Deep – AI Porn Video Editor
Deep Learning Porn Video Classifier/Editor with Caffe


Minimal and clean examples of machine learning algorithms

Airbnb Superset
Superset is a data exploration platform designed to be visual, intuitive, and interactive

Uber Deck.gl
WebGL based visualization layers

Statistics utilities for the JVM – in Scala!

*https://pocketcluster.wordpress.com will move to https://blog.pocketcluster.io on Jan 1, 2017. If you have subscribed this blog, please make sure to change the feed address.

Looking into adding your repo? Any suggestion? Comment? Send your feedback to stkim1@pocketcluster.io, or tweet to @stkim1!

Looking for more BigData or Machine Learning repositories? You can find a lot more tools, frameworks and libraries at PocketCluster Index.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Apache Spark 1.5.2 on Raspberry PI 2 cluster

This is the second post of a series about BigData cluster for OS X and Raspberry PI 2.

  1. Build Hadoop Cluster with 5 clicks.
  2. Apache Spark 1.5.2 on Raspberry PI 2 cluster
  3. The Next Version of PocketCluster

** update: Oct 2, 2017. The next version of PocketCluster is coming soon!

Once we have a foundation to build Big Data analytics stack, the next one that should come is Apache Spark. Spark provides speed boost to MapReduce based algorithms by executing such computations in memory and conducting DAG optimization. You can read more details in this paper.

Spark tremendously helps analytics computation since such computations are iterative in nature. Suppose you’re to read and write 100 GB of data again and agin on disk and how painfully slow that could be. (Some folks handle a couple petabytes everyday. Let’s not go there yet.) At the same time, Spark handles such operations inside memory. You just ought to experience the differences. If you’re to do anything with Big Data analytics, Spark is therefore just one thing you are to encounter any direction you go.

In fact, the very first two posts of this blog are about running Spark on Raspberry PI 2 (henceforth RPI2). Nevertheless, it wasn’t much of joy to build and run such thing. If you’ve been with me, you know we’ve crossed some serious creeks. Now, here comes an OS X application that deploys Apache Spark & Hadoop with few mouse clicks.


PocketCluster 0.1.3

Just like the previous post, I’m going to play a video and talk about few more details.

First of all, PocketCluster supports Vagrant and Raspberry PI 2 at the same time. If you want to carry a multi-node Big Data environment with you all the time, it is definitely recommended to go with Vagrant version. The installation and operation process is exactly the same as the one depicted in the video.

Meanwhile, I would recommend to go with RPI2 if you’re working in a stationary environment. Six RPI2 could provide roughly the same amount of computation power as one Intel i7 processor does. You’d be able to 1) quickly test your hypothesis or 2) debug your prototype in a real, multi-node environment.

(*Old generation Raspberry PI is not supported. Only Raspberry PI 2 is supported at the moment.)

Secondly, In order for PocketCluster to smoothly install and operate, you’d need a solid internet connection. All the software are downloaded and configured in run-time that jumpy connection could really ruin your experience. I am working on improving this one.

Third, Spark supports five different modes of operation. 1) Standalone 2) Pseudo cluster, 3) Standalone cluster, 4) YARN Client, and 5) Mesos client. PocketCluster supports Standalone cluster mode only at the moment.

Lastly, as soon as Spark installation completes, SparkR is configured run across slave nodes. It just that Homebrew installation of R takes forever since it needs to compile gcc to provide Fortran for R. Hence, should you need to use SparkR, just type following in a Terminal or iTerm shell on your Mac after installation. (*it could take about 40 mins.)

brew tap homebrew/science && brew install r && brew untap homebrew/science

For more detailed instructions about PocketCluster installation, please go to my previous post.

Here comes  PocketCluster 0.1.3  again. I’m looking for the next package to install. If you have a suggestion, please leave a comment below or tweet me @stkim1.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join my channel!