Weekly BigData & ML Roundup – Jan. 5, 2017

Intel has released BigDL, a Deep Learning library, for Spark. In addition, Apache Knox is updated to 0.11.0, Apache NiFi to 1.1.1, and Apache Streams to 0.4.1.

Examples

TensorKart
Self-driving MarioKart with TensorFlow

tensorkart.gif

Have Fun with Machine Learning: A Guide for Beginners
An absolute beginner’s guide to Machine Learning and Image Classification with Neural Networks

Sudoku
Can Convolutional Neural Networks Crack Sudoku Puzzles?

MNIST ASCII challenge
A funny challenge to solve CAPTCHA by Machine Learning or Computer Hacking (or both)

Conceptviz
Gallery of Concept Visualization

conceptviz-github-io

Markover
Natural Language Generation with Markov

100 NLP Papers
100 Must-Read NLProc Papers

DeepChess
Chess implemented in TensorFlow to experiment with Deep Learning methods

Toolsets

Vega
A visualization grammar and declarative format for creating and saving interactive visualization designs

Chart.js
Simple HTML5 Charts using the canvas tag

Mozilla MetricgGraphics.js
A library optimized for concise and principled data graphics and layouts.

Parallel Coordinates
A d3-based parallel coordinates plot in canvas

ClickHouse
A free analytic DBMS for big data.

Libraries

Intel BigDL
Distributed Deep learning Library for Apache Spark

Jupyter Scala
Lightweight Scala kernel for Jupyter / IPython 3

DeepLearning.scala
A DSL for creating complex neural networks

Spark Google Spreadsheets
Google Spreadsheets datasource for SparkSQL and DataFrames

IndexR
A columnar storage format for fast & realtime analyze with big data.

Gago
Golang genetic algorithm library

Goml
On-line Machine Learning in Go (and so much more)

Gorgonia
A library that helps facilitate machine learning in Go.

Libsvm
A simple, easy-to-use, and efficient software for SVM classification and regression

Recurrent.js
Deep Recurrent Neural Networks and LSTMs in Javascript. More generally also arbitrary expression graphs with automatic differentiation.


Like to add your project? Any suggestion? Comment? Send your feedback to stkim1@pocketcluster.io or tweet @stkim1

Looking for more BigData or Machine Learning repositories? You can find a lot more tools, frameworks and libraries at PocketCluster Index.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

900+ Repositories

Readers have reached out and asked how many repositories the index site has. They also wondered why it shows small number of them.

index-screenshot

 

As of today, the index currently tracks over 900+ BigData and Machine Learning repositories and daily updates their status. It isn’t, however, exactly obvious from the first look, and the site design does not help either; lack of search, no project count, and/or few broken links here and there.

Although all “issues” cannot be fixed right away, we can definitely spare some time for few low-hanging, easy ones. Among those, infinite scroll issue is looked into today.

The site is to scroll infinitely all the way to end. You do not need to click next pages or indexes to move around. The feature is built in that way so you can scroll through the entire collection of 900+ repositories and pick most appealing ones as quickly as possible.

Nevertheless, it has few glitches. Firstly, it’s not easy to see if there are more pages to go.

pocketcluster-index-loading

For that, a spinning indicator is added at the bottom of page when the next page is loading and there are more to see.

Secondly, when you click a project and come back, the site loses its original position and refreshes its content. This is rather irritating, and it would be ideal to keep the context related to old position. Few hours of investigation reveals that it involves a bit more work than expected. (Here’s a good link you might like to read about it.)

It will be looked again (promises!). As of now, when a project is clicked, a new tab will be open for further detail and you can keep your old context for further search.

double-taps

Hope you like the update, and please let me know if you have further question!


Looking into adding your repo? Any suggestion? Comment? Send your feedback to stkim1@pocketcluster.io or tweet to @stkim1!

Looking for more BigData or Machine Learning repositories? You can find a lot more tools, frameworks and libraries at PocketCluster Index.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Dec. 29, 2016

Apache Edgent, an analytics framework on edge devices, has recently reached the 1.0.0 milestone. Also, Facebook has recently released Beringei – a new storage engine specifically for time-series data.

Since this is the last round-up for this year, I’d like to take an opportunity to thank you for visiting, subscribing, and reaching out to me for various suggestions and encouragement. Weekly round-up will continue next year. Stay tuned!

Examples

NBA Player Movement
Visualization and analysis of NBA player tracking data

Google Youtube-8M
Starter code for working with the YouTube-8M dataset

Toolsets

Sergeant
Tools to Transform and Query Data with the Apache Drill, REST API, JDBC Interface, dplyr, and DBI Interfaces in R

Content Data Store
A system to provide storage facilities to massive data sets is in the form of images, pdfs, documents and scanned documents

Sematext Solr-Researcher
Solr SearchComponent for altering and re-executing queries that product poor results

Naniar
Tools for numerical and visual summaries of NAs

Models

ByteNet
A tensorflow implementation of French-to-English machine translation using DeepMind’s ByteNet

Neural Painter
Paint artistic patterns using random neural network

VAE-Clustering
Unsupervised clustering with (Gaussian mixture) VAEs

Libraries

Fregata
A light weight, super fast, large scale machine learning library on apache spark

Intel pWord2Vec
Parallelizing word2vec in shared and distributed memory

Cortex
Machine learning in Clojure

Tulip Indicators
A library of functions for technical analysis of financial data

Genann
Simple neural network library in ANSI C

Frameworks

Facebook Beringei (incubating)
A high performance, in-memory storage engine for time series data.

Apache Edgent (incubating)
An open source stream processing programming model and lightweight micro-kernel style runtime for edge devices that enables you to analyze data and events at the device


*https://pocketcluster.wordpress.com will move to https://blog.pocketcluster.io on Jan 1, 2017. If you have subscribed this blog, please make sure to change the feed address.

Looking into adding your repo? Any suggestion? Comment? Send your feedback to stkim1@pocketcluster.io or tweet to @stkim1!

Looking for more BigData or Machine Learning repositories? You can find a lot more tools, frameworks and libraries at PocketCluster Index.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Dec. 22, 2016

The author of Tensorflow Speech Recognition, @pannous, looks for contributors for his project.

Examples

Serenata de Amor
Fighting corruption with data and SCIENCE!

Self-Driving-Car
Self Driving (Toy) Ferrari with TensorFlow

Deep Learning Papers
Papers about deep learning ordered by task, date. Current state-of-the-art papers are labelled

Toolset

Mirage
GUI for Elasticsearch Queries

Model

Tensorflow Speech Recognition
Speech recognition using google’s tensorflow deep learning framework, sequence-to-sequence neural networks

Prediction Template Learning
Online machine learning algorithm that makes predictions about the environment to improve itself

DCGAN Tensorflow
A tensorflow implementation of Deep Convolutional Generative Adversarial Networks

Libraries

WordVectors
Pre-trained word vectors of 30+ languages

TensorBuilder
A TensorFlow library enables you to easily create complex deep neural networks by leveraging the phi DSL to help define their structure

Linkedin Goavro
A golang library that implements encoding and decoding of Avro data

TFLearn
Deep learning library featuring a higher-level API for TensorFlow

Deep-Pwning
Metasploit for machine learning


*https://pocketcluster.wordpress.com will move to https://blog.pocketcluster.io on Jan 1, 2017. If you have subscribed this blog, please make sure to change the feed address.

Looking into adding your repo? Any suggestion? Comment? Send your feedback to stkim1@pocketcluster.io or tweet to @stkim1!

Looking for more BigData or Machine Learning repositories? You can find a lot more tools, frameworks and libraries at PocketCluster Index.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Dec. 15, 2016

Example
Natural Language Processing in 10 Lines of Code
PyCon 2016 workshop Natural Language Processing in 10 Lines of Code

 

Toolset
Pinpoint
Pinpoint is an open source APM (Application Performance Management) tool for large-scale distributed systems written in Java.

 

Models
Tensorflow QRNN
QRNN implementation for TensorFlow

Non-Stationary Bandits
Non stationary bandit for experiments with Reinforcement Learning

EvolvingAI PPGN
Code for paper “Plug and Play Generative Networks”

Audio Style Transfer
TensorFlow implementation for audio neural style

TF-Genetic
Evolutionary Neural Networks backed by TensorFlow and pure Python

68747470733a2f2f7777772e636f646570726f6a6563742e636f6d2f4b422f41492f3437373638392f6a6d736c2d372e706e67

 

Libraries
HIP
Convert CUDA to Portable C++ Code

Darknet
Convolutional Neural Networks

ELFI
Engine for Likelihood-Free Inference

Concepts
Formal Concept Analysis with Python


*https://pocketcluster.wordpress.com will move to https://blog.pocketcluster.io on Jan 1, 2017. If you have subscribed this blog, please make sure to change the feed address.

Looking into adding your repo? Any suggestion? Comment? Send your feedback to stkim1@pocketcluster.io or tweet to @stkim1!

Looking for more BigData or Machine Learning repositories? You can find a lot more tools, frameworks and libraries at PocketCluster Index.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Dec. 8, 2016

This week opens with Apache Drill updated to v1.9 and the releases of two open-source AI training platforms, Lab by OpenAI and Universe by DeepMind.

Example

Deepmind Learning to Learn
Learning to Learn in TensorFlow

Toolsets

Appbaseio Gem
GUI for Data Modeling with Elasticsearch

TorchCraft
Connecting Torch to StarCraft

PDD
Advanced Bloom Filter Based Algorithms for Efficient Approximate Data De-Duplication in Streams

Models

Yahoo FEL
Fast Entity Linker Toolkit for training models to link entities to KnowledgeBase (Wikipedia) in documents and queries.

Speech-to-Text-WaveNet
End-to-end sentence level English speech recognition based on DeepMind’s WaveNet and TensorFlow

Chainer DFI
Implementation of Deep Feature Interpolation

Libraries

Peloton
The Self-Driving Database Management System

OpenAI Gym
A toolkit for developing and comparing reinforcement learning algorithms

OpenAI Cleverhans
A library for benchmarking vulnerability to adversarial examples

Frameworks

Apache Drill
A distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems

Deepmind Lab
A customisable 3D platform for agent-based AI research

OpenAI Universe
A software platform for measuring and training an AI’s general intelligence across the world’s supply of games, websites and other applications


*https://pocketcluster.wordpress.com will move to https://blog.pocketcluster.io on Jan 1, 2017. If you have subscribed this blog, please make sure to change the feed address.

Looking into adding your repo? Any suggestion? Comment? Send your feedback to stkim1@pocketcluster.io or tweet to @stkim1!

Looking for more BigData or Machine Learning repositories? You can find a lot more tools, frameworks and libraries at PocketCluster Index.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Dec. 1, 2016

Comma.ai has open-sourced its self-driving agent, OpenPilot.

Examples

Comma.ai OpenPilot
Open source driving agent

SnakeGame
A classic game of snake that is controlled by a neural network and trained using a genetic algorithm

Models

Twitter AnomalyDetection
Anomaly Detection with R by Twitter

A Deep Learning Model for Cancer Data
TensorFlow Deep Learning neural network model for University of Wisconsin Cancer data

Deep Recommend System
Deep learning recommend system with TensorFlow

Libraries

Entity Resolution for Apache Spark
Collection of some algorithms for entity resolution

Elastic R Client
R client for the Elasticsearch HTTP API

BraveyJS
A simple JavaScript NLP-like library to help you creating your own bot.

Tiny DNN
Header only, dependency-free deep learning framework in C++11

Toolset

DataFire
DataFire is an open source API integration framework – think Grunt for APIs, or Zapier for the command line.


*https://pocketcluster.wordpress.com will move to https://blog.pocketcluster.io on Jan 1, 2017. If you have subscribed this blog, please make sure to change the feed address.

Looking into adding your repo? Any suggestion? Comment? Send your feedback to stkim1@pocketcluster.io or tweet to @stkim1!

Looking for more BigData or Machine Learning repositories? You can find a lot more tools, frameworks and libraries at PocketCluster Index.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!