Weekly BigData & ML Roundup – Oct. 19, 2017


Rapid Draw
A simple artificial intelligence experiment to find out if mobile neural networks can recognize human-made doodles

Chancey NN
Predict college admissions outcome using AI

Instagram Influencers
Identifying a Large Number of Fake Followers on Instagram

NLP Tasks
Natural Language Processing Tasks and Selected References

The Python Graph Gallery
A website displaying hundreds of charts made with Python


a language for image processing and computational photography

Test Tube
Python library to easily log, organize and optimize Deep Learning experiments

Artemis aims to get rid of all the boring, bureaucratic coding involved in machine learning projects, so you can get to the good stuff quickly.

Scikit Plot
An intuitive library to add plotting functionality to scikit-learn objects.

Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.


Tensorflow Implementation of Generative Adversarial Imitation Learning

PyTorch SepConv
an implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch


C++ native client for Impala and Hive, with Python / pandas bindings

Guided LDA
Semi supervised guided topic model with custom guidedLDA

BLAS-like Library Instantiation Software Framework


A Hyper-Relational Database for Knowledge-Oriented System

Distributed training framework for TensorFlow.


BigData and ML Toolset & Library Weekly Roundup – Oct. 12, 2017

PocketCluster Index now has "search" activated at the top navigation bar.


PyTorch Zero To All
Simple PyTorch Tutorials Zero to ALL!

Pandas Cookbook
Recipes for using Python’s pandas library

Simple LSTM
Minimal, clean example of lstm neural network training in python, for learning purposes.

Awesome GAN Applications
Curated list of awesome GAN applications and demo

Word-level language modeling RNN
word-language-model imported and modified from pytorch-examples

Nvidia OpenSeq2Seq
Multi-GPU sequence to sequence learning

Knowledge Browser
Real-time query spark and visualise it as graph.

OpenSim Reinforcement Learning
Reinforcement learning environments with musculoskeletal models


Deep Learning toolkit for Computer Vision

Fast, flexible and easy to use probabilistic modelling in Python.

SQL-based streaming analytics platform at scale

Image Monkey Core
ImageMonkey is a free, public open source image validation service.

Sequence Semantic Embedding
An encoder framework toolkit for NLP related tasks and it’s implemented in TensorFlow by leveraging TF’s convenient DNN/CNN/RNN/LSTM etc


Poincare Embeddings
NumPy implementation of Poincaré Embeddings for Learning Hierarchical Representations (Facebook Research)

PyTorch QRNN
PyTorch implementation of the Quasi-Recurrent Neural Network – up to 16 times faster than NVIDIA’s cuDNN LSTM

This repository provides code for machine learning algorithms for edge devices developed at Microsoft Research India.


Deep Learning Library (DLL) for C++

Neural networks in JavaScript


A lightweight, modular, and scalable deep learning framework.

Weekly BigData & ML Roundup – Oct. 5, 2017


Pytorch Exercises
Familiarize PyTorch with simple quizzes

Fast Neural Style Transfer
Demo of in-browser Fast Neural Style Transfer with Deeplearn.JS library

2x Image Resolution
Rescale images to two times the original size using Decision Tree models. Matches and improves on traditional rescaling methods such as bilinear resampling. Noticable improvements on percieved sharpness of the image.

The AI can paint on a sketch accroding to a given specific color style.


Tensorflow UE4
Unreal Engine plugin for TensorFlow. Enables training and implementing state of the art machine learning algorithms for your unreal projects.

Validation of local and remote data tables

Apache RocketMQ
A distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.

Apple Core ML Tools
Core ML community tools contains all supporting tools for CoreML model conversion and validation. This includes Scikit Learn, LIBSVM, Caffe, Keras and XGBoost.

Facebook ELF
An Extensive, End-To-End, Lightweight and Flexible Platform for Game Research

Visualization of NBA games from raw SportVU data logs


Semantic Segmentation
Semantic Segmentation using Fully Convolutional Neural Network.

Auto Sleep Scorer
An open-source sleep stage classification Python package

TensorFlow implementation of ENet, trained on the Cityscapes dataset.

CycleGAN Tensorlayer
Re-implement CycleGAN in Tensorlayer

State-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc.

A pyTorch implementation of the DeepMoji model

PyTorch NTM
A Pytorch implementation of an NTM (Neural Turing Machine)

ShuffleNet implementation in TensorFlow


The goal of this library is to give the user the ability to efficiently train Deep Learning models in a homomorphically encrypted state without needing to be an expert in either

Fully asynchronous, pure JavaScript implementation of the Parquet file format

Mocked Streams
Scala Library for Unit-Testing Processing Topologies in Apache Kafka / Kafka Streams

A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

Reinforcement Learning framework to facilitate development and use of scalable RL algorithms and applications

Go scientific library for scientific computations involving linear algebra, special functions, Bessel, fast Fourier transforms, geometry calculations, NURBS, numerical quadrature, polyhedra, 3D transfinite interpolation, random numbers, Mersenne twister, probability distributions, optimisation, graph, plotting, visualisation, tensors, eigenvalues, differential equations, and much more.

Apache CarbonData
Apache CarbonData is an indexed columnar data format for fast analytics on big data platform, e.g.Apache Hadoop, Apache Spark, etc.


The Next Version of PocketCluster

It has been quite some time since PocketCluster application and the Raspberry PI image disappeared from the download page, and many have asked when they would be available again.

Let’s rewind the clock a bit. The original version of PocketCluster was written to build a Apache Hadoop + Spark cluster with Raspberry PIs and a Mac. Back in late 2015 when Google TensorFlow became available to the public, it also became clear that PocketCluster needed to handle more than one cluster frameworks for its users to properly execute whatever task at their hands. For example, PocketCluster Index tracks 129 frameworks, let alone +1,200 libraries, toolsets, and models.

When installing Apache Spark with PocketCluster could take several hours in worst case, the new requirement on horizon was simply an over-stretching. Business-as-usual patchwork type upgrade here and there could resolve the issue only if there was a sounding foundation. Otherwise, the kind of duct-tape measure would not land PocketCluster nowhere near you would look for. It badly needed to be rebuilt with certain goals in mind, and that has been undertaken for some time.

Followings are the changes made so far. We can talk about the goals in a later post.

Same Simple Installation


Your Mac is the master node of your cluster, and Raspberry Pi (RPI) devices are slave nodes. Bake a provided image to SD cards and boot up RPIs. Drag and drop PocketCluster to Application folder. Then give a double-click; no black and white terminal, no command to copy & paste, no wall of text. This is 2017 after all.


All-In-One, Ready-made, Out-of-the-Box Package



Cluster frameworks such as Apache Spark or Hadoop come in a package that needs no extra configuration steps. Have quality time focusing on your main task, and let PocketCluster take care of all the rest of small tedious chores.

Drastically Reduced Installation Time

package-installWith the previous versions, installing a package used to take up-to several hours in some cases. It often failed completely. The new version will complete installing a package of Hadoop + Spark + Jupyter across your entire cluster within half an hour.

Secure Network Connection

Most of PocketCluster network connections are securely encrypted. Not only this is done out of necessity of protecting your cluster from malicious infiltration attempts, but also is to provide you a shielded environment. This leads to a possibility where there could exist multiple clusters in a workplace or home, but your cluster operates just for you and nobody else.

All 64 bit Kernel


PocketCluster runs RPI3 with 64bit kernel. The previous versions operated on 32bit kernel, and it significantly hampered the ability to handle data in large size. One might raise an argument that RPI3 only has 1GB of memory, and there is no point of deploying such memory hungry kernel.

Shifting kernel surely comes with a plan. Besides RPI3, three more single board computer models at about the same price range have been evaluated, and one or two will be added to supported device category. They will have significantly more memory and I/O capacity to surely enhance your experience with PocketCluster.

Few words on missing regular update


It is highly unlikely that posting regular update on the progress takes the lowest priority. While it is indeed exactly the opposite, the rebuilding progress so far strongly resembles a job where you are to repair a road with thousands of small but deep crannies. You are to fill them all up, and make it smooth enough for cars to fly through without drivers feeling any bump. There have been more than many moments when cracks suddenly go down way deeper than what is foreseen. It was rather difficult to make updates for those moments, and weekly round-ups in the past were substitutes for progress updates like a life sign pulse.

Even at this point, there exist many corners literally taken off to meet a tentative timeline to release the new version before Christmas of this year. As there lies a strong and sounding foundation, however, the experience with PocketCluster is scheduled to enhance accordingly, and all those corners will be revisited and reconstituted eventually.

Thank you very much for keeping your interest in PocketCluster, and stay tuned.

Weekly BigData & ML Roundup – Sep. 28, 2017

Yahoo has recently open-sourced Vespa, a Big Data Processing and Serving Engine, with CNBC’s praise for its battle-hardened content recommendation, AD targeting, and search execution capabilities.


Deep Learning tutorials in jupyter notebooks.

Archive the Twitter sample firehose and daily trends

Benchmark Databases
A minimal benchmark of various tools (statistical software, databases etc.) for working with tabular data of moderately large sizes (interactive data analysis).

Baidu – Mobile Deep Learning
This research aims at simply deploying CNN (Convolutional Neural Network) on mobile devices, with low complexity and high speed.


A Game Agent Framework helping you create AIs / Bots to play any game you own

Unity Machine Learning Agents
Unity Machine Learning Agents

This C++ toolbox is aimed at representing and solving common AI problems, implementing an easy-to-use interface with Python bindings which should be hopefully extensible to many problems, while keeping code readable.

A dataset for RGB-D machine learning tasks captured throughout 90 properties with a Matterport Pro Camera


PyTorch Generative Model Collections
Collection of generative models in Pytorch version

Splitting GAN
Code for Class-Splitting Generative Adversarial Networks


A Library for Bayesian Deep Learning, Generative Models, Based on Tensorflow

A small Julia library and package wrapper for ML/PR/AI

Computational graph library for Machine Learning. The main point is to combine mathematical operation together to form a workflow of choice. The graph takes care of evaluating the gradient of all the inputs to ease up setting up the minimizer.

Face Alignment
2D and 3D Face alignment library build using PyTorch


Yahoo – Vespa
An engine for low-latency computation over large data sets. It stores and indexes your data such that queries, selection and processing over the data can be performed at serving time.

Weekly BigData & ML Roundup – Sep. 21, 2017


The fast.ai deep learning library, lessons, and tutorials

TensorFlow Tutorials
TensorFlow Tutorials with YouTube Videos

Evolving Snakes
Snakes from the classical game are controlled by neural networks and evolve using a genetic algorithm.

TagSpace tensorflow
Tensorflow implementation of Facebook TagSpace

DeepLearing Benchmark
Playing with various deep learning tools and network architectures

Interactive Real-Time Visualization for Streaming Data


A minimalist tree plotting library using toyplot graphs

The new architecture of co-computation for data processing and machine learning.

Cat Classifier
An experiment to visualize a trained logistic regression model as graph plots

FAIR Sequence-to-Sequence Toolkit
Facebook AI Research Sequence-to-Sequence Toolkit written in PyTorch

A platform to build deep learning models online


Torch implementation of various types of GAN (e.g. DCGAN, ALI, Context-encoder, DiscoGAN, CycleGAN)

A general-purpose neural model for efficient learning of entity embeddings for solving a wide variety of problems

Volumetric Regression Network
Torch7/MATLAB code for “Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression”


A flexible neural network library for Node.js and the browser

TensorFlow API for .NET languages

Weekly BigData & ML Roundup – Sep. 14, 2017


Awesome Pytorch List
A comprehensive list of pytorch related content on github such as different models, implementations, helper libraries, tutorials, etc.

Awesome AI Security
A curated list of AI security resources.

Technical Book on Deep Learning
This note presents in a technical thought in pedagogical way the three most common forms of neural network architectures: Feedforward, Convolutional and Recurrent.

A toolkit for controlling Euro Truck Simulator 2 with python to develop self-driving algorithms.


Deep Learning Model Convertors
The convertor/conversion of deep learning models for different deep learning frameworks/softwares.

Open Neural Network Exchange (ONNX) is the first step toward an open ecosystem that empowers AI developers to choose the right tools as their project evolves

An open source web application that helps researchers, students and data-scientists to create, collaborate and participate in various AI challenges organized round the globe

A JavaScript WebGL Framework for Data Visualization

AutoML Service
Deploy AutoML as a service using Flask

Lexicon Rainbow
A minimal data visualization module between a single ordinal scale and a single linear scale with in-built GUI


TensorFlow GANs Comparison
Implementations of (theoretical) generative adversarial networks and comparison without cherry-picking

PyTorch ActorCriticRL
PyTorch implementation of DDPG algorithm for continuous action reinforcement learning problem.

A recurrent unit that can run over 10 times faster than cuDNN LSTM without loss of accuracy to training RNNs as Fast as CNNs

Deep Recommender
Deep learning for recommender systems


TensorFlow Agents
Efficient Batched Reinforcement Learning in TensorFlow

An open-source NLP research library, built on PyTorch.

A modern, lightweight, performant and tunable OpenCL BLAS library written in C++11, designed to leverage the full performance potential of a wide variety of OpenCL devices from different vendors.

