Weekly BigData & ML Roundup – May 4, 2017

New Apache Top-Level Project, CarbonData

Apache CarbonData™ becomes a Top-Level Project (TLP) on May 1, 2017, which is in use at a variety of organizations, including Bank of Communications, medical/pharma social platform DXY, Hulu, Huawei, group online retailer MEITUAN, SAIC Motor, Zhejiang Mobile, among others.

Examples

1TB ML Benchmark
Benchmark of different ML algorithms on Criteo 1TB dataset

GPS Machine Learning
A repository contains code and jupyter notebooks with machine learning algorithms for working with GPS trajectories during IotTechDay2017.

Blaze Getting Started
Introduction to the Blaze ecosystem.

Awesome Machine Learning for Cyber Security
Machine Learning Project Collection for Cyber Security

Toolset

Perceptron
A flexible artificial neural network builder to analyse performance, and optimise the best model.

Models

Openpose
A Real-Time Multi-Person Keypoint Detection And Multi-Threading C++ Library

Automatic Speech Recognition
End-to-end automatic speech recognition from scratch in Tensorflow

PyTorch Style Transfer
Neural Style and MSG-Net

Libraries

Kafka PHP
Kafka php client

Salient
Machine Learning, Natural Language Processing and Sentiment Analysis Toolkit for Node.js

Forge
A neural network toolkit for Metal from Apple.

Frameworks

Apache cTAKES
A natural language processing system for extraction of information from electronic medical record clinical free-text

Pilosa
An open source, distributed bitmap index that dramatically accelerates queries across multiple, massive data sets.

Apache Carbondata
Apache CarbonData is an indexed columnar data format for fast analytics on big data platform, e.g.Apache Hadoop, Apache Spark, etc.

 


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Apr. 27, 2017

Examples

Reinforcement Learning
Minimal and Clean Reinforcement Learning Examples

Snowplow with Kafka
Example showing Snowplow Tracker and Collector writing to Kafka and being consumed from there

TextAI
REST API for Text Summarization and Keywords Extraction

Typefont
An artificial intelligence written entirely in JavaScript that recognises the font of a text in a image using the Tesseract optical character recognition engine and some image processing libraries

How to learn AI / Deep learning / Machine Learning
A practical, top-down approach, starting with high-level frameworks to increasingly difficult problems, beginning with test problems with clean datasets and the move towards real-world problems

Awesome Machine Learning with Ruby
Minimal and Clean Reinforcement Learning Examples

Evaluation of Deep Learning Toolkits
This research was done in late 2015 with slight modifications in early 2016. Many toolkits have improved significantly since then

Toolsets

Keras2C++
This is a bunch of code to port Keras neural network model into pure C++

TriFusion
Streamlining phylogenomic data gathering, processing and visualization

nteract
Desktop notebook app + packages

Models

CycleGAN Models
Models generated by CycleGAN

Autosklearn Zeroconf
A fully automated binary classifier based on the AutoML challenge winner auto-sklearn

Coloring-t-SNE
Exploration of methods for coloring t-SNE.

Libraries

Sumy
Module for automatic summarization of text documents and HTML pages.

fastText Multilingual
Multilingual word vectors in 78 languages


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Apr. 20, 2017

Examples

Really Awesome GAN
A list of papers on General Adversarial (Neural) Networks

The Incredible PyTorch
A curated list of tutorials, papers, projects, communities and more relating to PyTorch.

Neural Storyteller
A recurrent neural network for generating little stories about images

Emoji Intelligence
Neural Network Emoji playground using Swift for iPhone

Toolset

VOTT
An electron app for building end to end Object Detection Models from Sample Videos.

Models

Pretrained Show and Tell Model
A Neural Image Caption Generator implemented in Tensorflow.

Generative Models
Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow.

DeepMind – Differentiable Neural Computer
A TensorFlow implementation of the Differentiable Neural Computer.

Library

BICO
BICO is a fast streaming algorithm and reduction technique for the k-means problem


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Apr. 13, 2017

Examples

Where Am I
Uses WiFi signals and machine learning to predict where you are

AWS Lambda Face
Perform deep neural network based face detection and recognition in the cloud (via AWS lambda) with zero model configuration or tuning.

Awesome Artificial Intelligence (AI)
A curated list of Artificial Intelligence (AI) courses, books, video lectures and papers

DeepLearning Zero To All
TensorFlow Basic Tutorial Labs

Horse Racing Prediction
Using Support Vector regression algorithm to predict horse racing results

Neural Complete
A neural network trained to help writing neural network code using autocomplete

Toolset

NStack
Type-safe, composable microservices for data analytics written in Haskell

Models

DeepMind DQN 3.0
Lua/Torch implementation of DQN (Nature, 2015)

Customized Attention Span (CAS)
Recurrent neural networks with customized attention spans.

zi2zi
Learning Chinese Character style with conditional GAN

DenseNet
Code for Densely Connected Convolutional Networks (DenseNets)

Library

DeepMind Sonnet
TensorFlow-based neural network library


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Apr. 6, 2017

Examples

Flappy ES
Playing Flappy Bird using Evolution Strategies

Kepler’s Machines
Machine Learning Project to discover Exoplanets

TensorFlow Book in R
This is the unofficial code repository for Machine Learning with TensorFlow(R).

Awesome Neuroscience
A curated list of awesome neuroscience libraries, software and any content related to the domain

Cracking the Da Vinci Code with Google interview problems and NLP
A guide on how to crack combinatorics puzzles shown in The Da Vinci Code movie using CS fundamentals and NLP

Models

KMeans Elbow
Code for determining optimal number of clusters for K-means algorithm using the ‘elbow criterion’

CycleGAN
Software that generates photos from paintings, turns horses into zebras, performs style transfer, and more

TensorFlow Autoencoders
Implementations of autoencoder, generative adversarial networks, variational autoencoder and adversarial variational autoencoder

Pointer Networks in Tensorflow
TensorFlow implementation of “Pointer Networks”

 


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Mar. 30, 2017

Machine Learning with ARM

In today’s Data Science and Machine Learning world, GPU based acceleration, especially with the ones from Nvidia, is de facto for production level performance. There are many factors contributed to the status quo, and it is hard to deny that solid, consistent SDK support from the company has played a very significant role.

ARM has officially initiated a similar SDK support for hardware acceleration on Machine Learning as recently as 20 days ago. This implies that ARM hardware in your hands such as Raspberry PI will have chances to unlock its full potential in this field. Stay tuned.

Examples

Snaky
A snake game, three versions of AI included, implemented in python, pygame.

Game2vec
TensorFlow implementation of word2vec applied on Steam video games dataset

RNN-Tutorial
Recurrent Neural Networks – A Short TensorFlow Tutorial

Google Magenta
Music and Art Generation with Machine Intelligence

Spark Streaming with Google Cloud by @hereticreader
An example of integrating Spark Streaming with Google Pub/Sub and Google Datastore

Toolset

React.js-Jupyter
Jupytor, powered by React.js

Models

Deep Photo Style Transfer
Code and data for paper “Deep Photo Style Transfer”

Evolution Strategies Starter
Starter code for Evolution Strategies

HTMLAI RNN
Train a RNN (Recurrent Neural Network) to generate valid HTML and CSS templates for websites based on character by character training

Lime
Explaining the predictions of any machine learning classifier

Libraries

ARM Compute Library
The ARM Computer Vision and Machine Learning library is a set of functions optimised for both ARM CPUs and GPUs using SIMD technologies.

Facebook TorchMPI
Implements a message passing interface (MPI) wrapper that makes it easy to do massively parallel computations inside the Torch deep-learning framework.


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Mar. 23, 2017

LLVM for Data Processing
When Apache Spark launched Tungsten, there was a hint of incorporating LLVM into data processing pipeline to make use of modern CPU features and its superb machine code for performance boost.

LLVM appears again with Weld, a new code generation project for data analytics. The project claims that Tensorflow, Spark, and Numpy can be accelerated up to 30x with just few operations of Weld!

Interestingly, Matei Zaharia is in its contributor list.

Examples

Algorithmic Trading Pipeline
Algorithmic Trading Pipeline for Online Betting Markets

Blackbird
Blackbird Bitcoin Arbitrage: a long/short market-neutral strategy

Toolsets

Weld
Weld is a runtime and language for accelerating data analytics frameworks

bq-utils
Utitilties for BigQuery such as downloading table / query to csv/ndjson/excel/gsheet or new table using iterators for a low memory footprint.

Wiki2Vec
Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps

ETL Starter Kit
Extract, Transform, Load (ETL) refers to a process in database usage and especially in data warehousing.

Facebook Visdom
A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy.

Vivitics Node (VNode)
Vivitics Node (VNode): A workbench for Data Science powered by Jupyter and Docker

Models

Mozilla DeepSpeech
A TensorFlow implementation of Baidu’s DeepSpeech architecture

DiscoGAN
Official implementation of “Learning to Discover Cross-Domain Relations with Generative Adversarial Networks”

Libraries

Redis-ML
Machine Learning Model Server

UC Berkeley Ray
An experimental distributed execution engine


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!