Weekly BigData & ML Roundup – Mar. 2, 2017

Examples

And the award goes to…
Oscar prediction with classification algorithms

Pandas Tutorial
Tutorial on Using Pandas

Practical RL
A course in reinforcement learning in the wild

blupig
A serious Gomoku board game AI written in C++

CarND Vehicle Detection
Vehicle Tracking and Detection Project Submitted for Udacity’s CND using Traditional Computer Vision and Machine Learning Techniques

Election History
US Presidential Elections since 1789

Toolsets

Highcharter
R wrapper for highcharts based on htmlwidgets

Airflow Scheduler Failover Controller
A process that runs in unison with Apache Airflow to control the Scheduler process to ensure High Availability

miller
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

Faker
Faker is a Python package that generates fake data for you

Deep Video Analytics
Analyze videos & images, perform detections, index frames & detected objects, search by examples

Models

ML-From-Scratch
Bare bones Python implementations of some of the foundational Machine Learning models and algorithms

deconvfaces
Generating faces with deconvolution networks

fast-neural-style-tensorflow
A tensorflow implementation for fast neural style!

Libraries

Prophet: Automatic Forecasting Procedure
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth

cucco
Python library for text normalization

xtensor
Multi-dimensional arrays with broadcasting and lazy computing

Kadot
Kadot, the unsupervised natural language processing library

caffe-tensorflow
Caffe models in TensorFlow

neurojs
A javascript deep learning and reinforcement learning library

Computational Healthcare
Analyze large healthcare datasets & build machine learning models using TensorFlow


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Feb. 23, 2017

Three Apache Projects Updated

Apache Geode, a data management platform, 1.1.0 released.
Apache SINGA, a general distributed deep learning platform, 1.1.0 released.
Apache Storm, a realtime data processing system, 1.0.3 released.

Examples

basic_deep_learning_keras
Your first deep neural network in less than 5 minutes

painters
Winning solution for the Painter by Numbers competition on Kaggle

tiefvision
End-to-end deep learning image-similarity search engine

AdversarialNetsPapers
The classical papers and codes about generative adversarial nets

data-science-ipython-notebooks
Continually updated data science Python notebooks: Deep learning, scikit-learn, Kaggle, big data, matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines

A.I. Duet
A piano that responds to you.

Toolsets

elasticsearch-learning-to-rank
Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch

incubator-airflow
Airflow is a platform to programmatically author, schedule and monitor workflows.

ipyvolume
3d plotting for Python in the Jupyter notebook based on IPython widgets using WebGL

scikit-plot
An intuitive library to add plotting functionality to scikit-learn objects.

CatterPlots
Did you ever wish you could make scatter plots with cat shaped points? Now you can!

Models

DeepLearningImplementations
Implementation of recent Deep Learning papers

parrot
RNN-based generative models for speech.

pix2pix-tensorflow
Tensorflow port of Image-to-Image Translation with Conditional Adversarial Nets

Libraries

bootstrapped
Generate bootstrapped confidence intervals for A/B testing in Python.

SHMArrays
Read and write numpy arrays with multiple processes stored as shared memory

hadoop-binary-analysis
Framework that makes processing arbitrary binary data in Hadoop easier

mljar-api-python
A simple python wrapper over MLJAR API.

convnetjs
Deep Learning in Javascript. Train Convolutional Neural Networks (or ordinary ones) in your browser.

TensorFlow ecosystem
Integration of TensorFlow with other open-source frameworks

Framework

Netflix genie
Federated Big Data Orchestration Service


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Feb. 16, 2017

Apache Ranger, a Big Data security management framework for Hadoop, is promoted as a Top-Level Apache Project.

Examples

awesome-deep-learning-papers
Awesome – Most Cited Deep Learning Papers

practical-pytorch
Practical PyTorch tutorials, focused on using neural networks for natural language tasks

ml-playground
Place to toy around with different ML models

Numerical-Analysis-Examples
Numerical Analysis Implementations in Various Languages

Toolsets

AirSim
Open source simulator based on Unreal Engine for autonomous vehicles from Microsoft AI & Research

ChosunTruck
Euro Truck Simulator 2 autonomous driving solution

ipdb
Integration of IPython pdb

Bella
A pure python, post-exploitation, data mining tool and remote administration tool for macOS

Libraries

TensorFlowOnSpark
TensorFlowOnSpark brings TensorFlow programs onto Apache Spark clusters

glow
Glow is an easy-to-use distributed computation system written in Go, similar to Hadoop Map Reduce

gleam
Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly

dlib
A toolkit for making real world machine learning and data analysis applications in C++

flint
A Time Series Library for Apache Spark

kmcuda
Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA

Frameworks

ranger
Ranger is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform

thrill
An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Feb. 9, 2017

Apache Bahir, distributed analytic extensions for Spark and Flink, v2.0.2 is released.

Awesome Network Analysis and DataScienceR directly point to Github repositories as their readme are the core contents.

Examples

MLPB
Machine Learning Problem Bible

self-driving-car-sim
A self-driving car simulator built with Unity

lectures
Oxford Deep NLP 2017 course

gt-nlp-class
Course materials for Georgia Tech CS 4650 and 7650, “Natural Language”

Awesome Network Analysis
A curated list of awesome network analysis resources

DataScienceR
a curated list of R tutorials for Data Science, NLP and Machine Learning

Models

TensorFlow Fold
Deep learning with dynamic computation graphs in TensorFlow

darkflow
Real-time object detection and classification

ResNeXt
Implementation of a classification framework from the paper Aggregated Residual Transformations for Deep Neural Networks

Libraries

MITIE
library and tools for information extraction

nmtpy
A suite of Python tools for training neural machine translation networks using Theano

renjin
JVM-based interpreter for the R language for the statistical analysis.

reticulate
R Interface to Python

tulipy
Python bindings for Tulip Indicators

DeepDarkFantasy
What if we combine Functional Programming and Deep Learning?

ScenicOverlook
A Python library for incremental, in-memory map-reduces


1000+ tools, frameworks and libraries collected at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Feb. 2, 2017

Apache Impala 2.8.0, Apache Kudu 1.2.0, and Apache Parquet 1.8.2 are newly released.

Examples
election-transparency
This project analyzes elections in an effort to identify trends, outliers, and/or anomalies to enable insight and transparency into the democratic voting process

clickbait-detector
Detects clickbait headlines using deep learning

ml-videos
A collection of video resources for machine learning

Toolsets
aleph
Sift through large sets of structured and unstructured data, and find the people and companies you look for

DiagrammeR
Graph and network visualization using tabular data in R

newsflash
Tools to Work with the Internet Archive and GDELT Television Explorer in R

dlbench
A benchmark framework for measuring different deep learning tools

plawt
JSON-like sugar for matplotlib

artificial_seinfeld
Tools for generating artificial Seinfeld episodes using Keras LSTM w/ Hyperparameter Optimization

tensorify
A small Python utility module that provides TensorFlow decorators

Models
PaintsChainer
line drawing colorization using chainer

WassersteinGAN
Code accompanying the paper “Wasserstein GAN”

BiDNN
Bidirectional (Symmetrical) Deep Neural Networks

Libraries
language-detection
A language detection library for PHP. Detects the language from a given text string.

nlp
General purpose any-lang Natural Language Processor that parses the data inside a text and returns a filled model

 


1000+ tools, frameworks and libraries collected at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Jan. 26, 2017

Apache HBase is updated to 1.3.0

Examples
BossSensor
Hide screen when boss is approaching

sklearn-deeprl
Deep reinforcement learning. In scikit-learn. In less than 50 effective lines.

Toolsets
OpenRefine
A free, open source power tool for working with messy data and improving it

ml_sampler
Use machine learning to take ‘better’ random samples!

Hollow
A java library and comprehensive toolset for harnessing small to moderately sized in-memory datasets

mocker-data-generator
A simplified way to generate masive mock data based on a schema using the fake/random data generators

Models
dtn-tensorflow
domain transfer network. tensorflow implementation of unsupervised cross-domain image generation

multilayer-perceptron
Library to make and train a concurrent multilayer perceptron

Attention Transfer
Improving Convolutional Networks via Attention Transfer

Libraries
Tars
A simple deep generative model library in Theano and Lasagne

Gota
DataFrames and data wrangling in Go (Golang)

nvParse
Fast, gpu-based CSV parser

 


1000+ tools, frameworks and libraries collected at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Jan. 19, 2017

Exiting incubator, Apache Beam and Apache Eagle now become Top-Level Apache Projects.
Apache Calcite has recently released v1.11.0.
National Security Agency has open-sourced a time-series database, Timely, with tight analysis stack integration.

 

Examples
P-Brain.ai
Natural language virtual assistant built from scratch using Node.js + Bootstrap

 

Housing Prices Prediction using various Regression Models
With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home

Using Machine Learning to Identify authors of texts
Language is a set of choices, and speakers and writers tend to fall into habitual, or at least common, choices.

Awesome Embedding Models
A curated list of awesome embedding models tutorials, projects and communities

Toolsets
deepforge
A development IDE for deep learning

 

itermplot
An awesome iTerm2 backend for Matplotlib, so you can plot directly in your terminal.

 

etlalchemy
Extract, Transform, Load from Any SQL Database in 4 lines of Code

Models
WaterNet
A convolutional neural network that identifies water in satellite images

 

Recurrent Neural Network Grammars
Probabilistic models of sentences with explicit phrase structure

language-universal-parser
A multilingual model for dependency parsing using multilingual word clusters and embeddings, token-level language information, and language-specific features

Libraries
OpenNMT
Open-Source Neural Machine Translation in Torch

 

neupy
A Python library for Artificial Neural Networks and Deep Learning

 

pytorch
A python package that provides Tensor computation (like numpy) with strong GPU acceleration and Deep Neural Networks built on a tape-based autograd system

minpy
NumPy interface with mixed backend execution

rust-openai
Bindings to OpenAI Gym for Rust language

Framework
NSA Timely
Accumulo backed time series database


Total 1,115 tools, frameworks and libraries at PocketCluster Index today.
Like to add your project? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!