Weekly BigData & ML Roundup – Feb. 23, 2017

Three Apache Projects Updated

Apache Geode, a data management platform, 1.1.0 released.
Apache SINGA, a general distributed deep learning platform, 1.1.0 released.
Apache Storm, a realtime data processing system, 1.0.3 released.

Examples

basic_deep_learning_keras
Your first deep neural network in less than 5 minutes

painters
Winning solution for the Painter by Numbers competition on Kaggle

tiefvision
End-to-end deep learning image-similarity search engine

AdversarialNetsPapers
The classical papers and codes about generative adversarial nets

data-science-ipython-notebooks
Continually updated data science Python notebooks: Deep learning, scikit-learn, Kaggle, big data, matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines

A.I. Duet
A piano that responds to you.

Toolsets

elasticsearch-learning-to-rank
Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch

incubator-airflow
Airflow is a platform to programmatically author, schedule and monitor workflows.

ipyvolume
3d plotting for Python in the Jupyter notebook based on IPython widgets using WebGL

scikit-plot
An intuitive library to add plotting functionality to scikit-learn objects.

CatterPlots
Did you ever wish you could make scatter plots with cat shaped points? Now you can!

Models

DeepLearningImplementations
Implementation of recent Deep Learning papers

parrot
RNN-based generative models for speech.

pix2pix-tensorflow
Tensorflow port of Image-to-Image Translation with Conditional Adversarial Nets

Libraries

bootstrapped
Generate bootstrapped confidence intervals for A/B testing in Python.

SHMArrays
Read and write numpy arrays with multiple processes stored as shared memory

hadoop-binary-analysis
Framework that makes processing arbitrary binary data in Hadoop easier

mljar-api-python
A simple python wrapper over MLJAR API.

convnetjs
Deep Learning in Javascript. Train Convolutional Neural Networks (or ordinary ones) in your browser.

TensorFlow ecosystem
Integration of TensorFlow with other open-source frameworks

Framework

Netflix genie
Federated Big Data Orchestration Service


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Feb. 16, 2017

Apache Ranger, a Big Data security management framework for Hadoop, is promoted as a Top-Level Apache Project.

Examples

awesome-deep-learning-papers
Awesome – Most Cited Deep Learning Papers

practical-pytorch
Practical PyTorch tutorials, focused on using neural networks for natural language tasks

ml-playground
Place to toy around with different ML models

Numerical-Analysis-Examples
Numerical Analysis Implementations in Various Languages

Toolsets

AirSim
Open source simulator based on Unreal Engine for autonomous vehicles from Microsoft AI & Research

ChosunTruck
Euro Truck Simulator 2 autonomous driving solution

ipdb
Integration of IPython pdb

Bella
A pure python, post-exploitation, data mining tool and remote administration tool for macOS

Libraries

TensorFlowOnSpark
TensorFlowOnSpark brings TensorFlow programs onto Apache Spark clusters

glow
Glow is an easy-to-use distributed computation system written in Go, similar to Hadoop Map Reduce

gleam
Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly

dlib
A toolkit for making real world machine learning and data analysis applications in C++

flint
A Time Series Library for Apache Spark

kmcuda
Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA

Frameworks

ranger
Ranger is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform

thrill
An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Feb. 9, 2017

Apache Bahir, distributed analytic extensions for Spark and Flink, v2.0.2 is released.

Awesome Network Analysis and DataScienceR directly point to Github repositories as their readme are the core contents.

Examples

MLPB
Machine Learning Problem Bible

self-driving-car-sim
A self-driving car simulator built with Unity

lectures
Oxford Deep NLP 2017 course

gt-nlp-class
Course materials for Georgia Tech CS 4650 and 7650, “Natural Language”

Awesome Network Analysis
A curated list of awesome network analysis resources

DataScienceR
a curated list of R tutorials for Data Science, NLP and Machine Learning

Models

TensorFlow Fold
Deep learning with dynamic computation graphs in TensorFlow

darkflow
Real-time object detection and classification

ResNeXt
Implementation of a classification framework from the paper Aggregated Residual Transformations for Deep Neural Networks

Libraries

MITIE
library and tools for information extraction

nmtpy
A suite of Python tools for training neural machine translation networks using Theano

renjin
JVM-based interpreter for the R language for the statistical analysis.

reticulate
R Interface to Python

tulipy
Python bindings for Tulip Indicators

DeepDarkFantasy
What if we combine Functional Programming and Deep Learning?

ScenicOverlook
A Python library for incremental, in-memory map-reduces


1000+ tools, frameworks and libraries collected at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Feb. 2, 2017

Apache Impala 2.8.0, Apache Kudu 1.2.0, and Apache Parquet 1.8.2 are newly released.

Examples
election-transparency
This project analyzes elections in an effort to identify trends, outliers, and/or anomalies to enable insight and transparency into the democratic voting process

clickbait-detector
Detects clickbait headlines using deep learning

ml-videos
A collection of video resources for machine learning

Toolsets
aleph
Sift through large sets of structured and unstructured data, and find the people and companies you look for

DiagrammeR
Graph and network visualization using tabular data in R

newsflash
Tools to Work with the Internet Archive and GDELT Television Explorer in R

dlbench
A benchmark framework for measuring different deep learning tools

plawt
JSON-like sugar for matplotlib

artificial_seinfeld
Tools for generating artificial Seinfeld episodes using Keras LSTM w/ Hyperparameter Optimization

tensorify
A small Python utility module that provides TensorFlow decorators

Models
PaintsChainer
line drawing colorization using chainer

WassersteinGAN
Code accompanying the paper “Wasserstein GAN”

BiDNN
Bidirectional (Symmetrical) Deep Neural Networks

Libraries
language-detection
A language detection library for PHP. Detects the language from a given text string.

nlp
General purpose any-lang Natural Language Processor that parses the data inside a text and returns a filled model

 


1000+ tools, frameworks and libraries collected at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Jan. 26, 2017

Apache HBase is updated to 1.3.0

Examples
BossSensor
Hide screen when boss is approaching

sklearn-deeprl
Deep reinforcement learning. In scikit-learn. In less than 50 effective lines.

Toolsets
OpenRefine
A free, open source power tool for working with messy data and improving it

ml_sampler
Use machine learning to take ‘better’ random samples!

Hollow
A java library and comprehensive toolset for harnessing small to moderately sized in-memory datasets

mocker-data-generator
A simplified way to generate masive mock data based on a schema using the fake/random data generators

Models
dtn-tensorflow
domain transfer network. tensorflow implementation of unsupervised cross-domain image generation

multilayer-perceptron
Library to make and train a concurrent multilayer perceptron

Attention Transfer
Improving Convolutional Networks via Attention Transfer

Libraries
Tars
A simple deep generative model library in Theano and Lasagne

Gota
DataFrames and data wrangling in Go (Golang)

nvParse
Fast, gpu-based CSV parser

 


1000+ tools, frameworks and libraries collected at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Jan. 19, 2017

Exiting incubator, Apache Beam and Apache Eagle now become Top-Level Apache Projects.
Apache Calcite has recently released v1.11.0.
National Security Agency has open-sourced a time-series database, Timely, with tight analysis stack integration.

 

Examples
P-Brain.ai
Natural language virtual assistant built from scratch using Node.js + Bootstrap

 

Housing Prices Prediction using various Regression Models
With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home

Using Machine Learning to Identify authors of texts
Language is a set of choices, and speakers and writers tend to fall into habitual, or at least common, choices.

Awesome Embedding Models
A curated list of awesome embedding models tutorials, projects and communities

Toolsets
deepforge
A development IDE for deep learning

 

itermplot
An awesome iTerm2 backend for Matplotlib, so you can plot directly in your terminal.

 

etlalchemy
Extract, Transform, Load from Any SQL Database in 4 lines of Code

Models
WaterNet
A convolutional neural network that identifies water in satellite images

 

Recurrent Neural Network Grammars
Probabilistic models of sentences with explicit phrase structure

language-universal-parser
A multilingual model for dependency parsing using multilingual word clusters and embeddings, token-level language information, and language-specific features

Libraries
OpenNMT
Open-Source Neural Machine Translation in Torch

 

neupy
A Python library for Artificial Neural Networks and Deep Learning

 

pytorch
A python package that provides Tensor computation (like numpy) with strong GPU acceleration and Deep Neural Networks built on a tape-based autograd system

minpy
NumPy interface with mixed backend execution

rust-openai
Bindings to OpenAI Gym for Rust language

Framework
NSA Timely
Accumulo backed time series database


Total 1,115 tools, frameworks and libraries at PocketCluster Index today.
Like to add your project? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Jan. 12, 2017

Google TensorFlow has reached v1.0.0 Alpha, and Apache OpenNLP 1.7.0 is released. In addition, Apache Foundation has recently promoted Apache Beam™ and Eagle™ as top level projects.

A new email theme is now being used in weekly roundup. Subscribe for weekly summary in your mailbox!

Examples

MuGo
Replicating AlphaGo’s architecture in a readable manner.

Estimation-of-Remaining-Useful-Life
CNN based regression approach for estimation of machinery’s remaining useful life.

deep-murasaki
Deep learning chess engine, that has no idea about chess rules, but watches and learns.

Toolset

GIS tools for Hadoop
The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.

Models

Deep Text Correcter
Deep learning models trained to correct input errors in short, message-like text.

CAPTCHA Recognition with Active Deep Learning
An Active Deep Learning strategy gaining new training data without any human intervention.

unet-color
Deep colorizer based on unet architecture (Encoder decoder with skip connections).

Libraries

Cranium
A portable, header-only, artificial neural network library written in C99.

PyFlux
Open source time series library for Python.

word2vec4everything
word2vec for (almost) everything.


Like to add your project? Any suggestion? Feedback? Send your feedback to stkim1@pocketcluster.io or tweet @stkim1

Looking for more Big Data or Machine Learning repositories? You can find a lot more tools, frameworks and libraries at PocketCluster Index.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!