Weekly BigData & ML Roundup – Feb. 2, 2017

Apache Impala 2.8.0, Apache Kudu 1.2.0, and Apache Parquet 1.8.2 are newly released.

Examples
election-transparency
This project analyzes elections in an effort to identify trends, outliers, and/or anomalies to enable insight and transparency into the democratic voting process

clickbait-detector
Detects clickbait headlines using deep learning

ml-videos
A collection of video resources for machine learning

Toolsets
aleph
Sift through large sets of structured and unstructured data, and find the people and companies you look for

DiagrammeR
Graph and network visualization using tabular data in R

newsflash
Tools to Work with the Internet Archive and GDELT Television Explorer in R

dlbench
A benchmark framework for measuring different deep learning tools

plawt
JSON-like sugar for matplotlib

artificial_seinfeld
Tools for generating artificial Seinfeld episodes using Keras LSTM w/ Hyperparameter Optimization

tensorify
A small Python utility module that provides TensorFlow decorators

Models
PaintsChainer
line drawing colorization using chainer

WassersteinGAN
Code accompanying the paper “Wasserstein GAN”

BiDNN
Bidirectional (Symmetrical) Deep Neural Networks

Libraries
language-detection
A language detection library for PHP. Detects the language from a given text string.

nlp
General purpose any-lang Natural Language Processor that parses the data inside a text and returns a filled model

 


1000+ tools, frameworks and libraries collected at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Jan. 26, 2017

Apache HBase is updated to 1.3.0

Examples
BossSensor
Hide screen when boss is approaching

sklearn-deeprl
Deep reinforcement learning. In scikit-learn. In less than 50 effective lines.

Toolsets
OpenRefine
A free, open source power tool for working with messy data and improving it

ml_sampler
Use machine learning to take ‘better’ random samples!

Hollow
A java library and comprehensive toolset for harnessing small to moderately sized in-memory datasets

mocker-data-generator
A simplified way to generate masive mock data based on a schema using the fake/random data generators

Models
dtn-tensorflow
domain transfer network. tensorflow implementation of unsupervised cross-domain image generation

multilayer-perceptron
Library to make and train a concurrent multilayer perceptron

Attention Transfer
Improving Convolutional Networks via Attention Transfer

Libraries
Tars
A simple deep generative model library in Theano and Lasagne

Gota
DataFrames and data wrangling in Go (Golang)

nvParse
Fast, gpu-based CSV parser

 


1000+ tools, frameworks and libraries collected at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Jan. 19, 2017

Exiting incubator, Apache Beam and Apache Eagle now become Top-Level Apache Projects.
Apache Calcite has recently released v1.11.0.
National Security Agency has open-sourced a time-series database, Timely, with tight analysis stack integration.

 

Examples
P-Brain.ai
Natural language virtual assistant built from scratch using Node.js + Bootstrap

 

Housing Prices Prediction using various Regression Models
With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home

Using Machine Learning to Identify authors of texts
Language is a set of choices, and speakers and writers tend to fall into habitual, or at least common, choices.

Awesome Embedding Models
A curated list of awesome embedding models tutorials, projects and communities

Toolsets
deepforge
A development IDE for deep learning

 

itermplot
An awesome iTerm2 backend for Matplotlib, so you can plot directly in your terminal.

 

etlalchemy
Extract, Transform, Load from Any SQL Database in 4 lines of Code

Models
WaterNet
A convolutional neural network that identifies water in satellite images

 

Recurrent Neural Network Grammars
Probabilistic models of sentences with explicit phrase structure

language-universal-parser
A multilingual model for dependency parsing using multilingual word clusters and embeddings, token-level language information, and language-specific features

Libraries
OpenNMT
Open-Source Neural Machine Translation in Torch

 

neupy
A Python library for Artificial Neural Networks and Deep Learning

 

pytorch
A python package that provides Tensor computation (like numpy) with strong GPU acceleration and Deep Neural Networks built on a tape-based autograd system

minpy
NumPy interface with mixed backend execution

rust-openai
Bindings to OpenAI Gym for Rust language

Framework
NSA Timely
Accumulo backed time series database


Total 1,115 tools, frameworks and libraries at PocketCluster Index today.
Like to add your project? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Jan. 12, 2017

Google TensorFlow has reached v1.0.0 Alpha, and Apache OpenNLP 1.7.0 is released. In addition, Apache Foundation has recently promoted Apache Beam™ and Eagle™ as top level projects.

A new email theme is now being used in weekly roundup. Subscribe for weekly summary in your mailbox!

Examples

MuGo
Replicating AlphaGo’s architecture in a readable manner.

Estimation-of-Remaining-Useful-Life
CNN based regression approach for estimation of machinery’s remaining useful life.

deep-murasaki
Deep learning chess engine, that has no idea about chess rules, but watches and learns.

Toolset

GIS tools for Hadoop
The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.

Models

Deep Text Correcter
Deep learning models trained to correct input errors in short, message-like text.

CAPTCHA Recognition with Active Deep Learning
An Active Deep Learning strategy gaining new training data without any human intervention.

unet-color
Deep colorizer based on unet architecture (Encoder decoder with skip connections).

Libraries

Cranium
A portable, header-only, artificial neural network library written in C99.

PyFlux
Open source time series library for Python.

word2vec4everything
word2vec for (almost) everything.


Like to add your project? Any suggestion? Feedback? Send your feedback to stkim1@pocketcluster.io or tweet @stkim1

Looking for more Big Data or Machine Learning repositories? You can find a lot more tools, frameworks and libraries at PocketCluster Index.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Jan. 5, 2017

Intel has released BigDL, a Deep Learning library, for Spark. In addition, Apache Knox is updated to 0.11.0, Apache NiFi to 1.1.1, and Apache Streams to 0.4.1.

Examples

TensorKart
Self-driving MarioKart with TensorFlow

tensorkart.gif

Have Fun with Machine Learning: A Guide for Beginners
An absolute beginner’s guide to Machine Learning and Image Classification with Neural Networks

Sudoku
Can Convolutional Neural Networks Crack Sudoku Puzzles?

MNIST ASCII challenge
A funny challenge to solve CAPTCHA by Machine Learning or Computer Hacking (or both)

Conceptviz
Gallery of Concept Visualization

conceptviz-github-io

Markover
Natural Language Generation with Markov

100 NLP Papers
100 Must-Read NLProc Papers

DeepChess
Chess implemented in TensorFlow to experiment with Deep Learning methods

Toolsets

Vega
A visualization grammar and declarative format for creating and saving interactive visualization designs

Chart.js
Simple HTML5 Charts using the canvas tag

Mozilla MetricgGraphics.js
A library optimized for concise and principled data graphics and layouts.

Parallel Coordinates
A d3-based parallel coordinates plot in canvas

ClickHouse
A free analytic DBMS for big data.

Libraries

Intel BigDL
Distributed Deep learning Library for Apache Spark

Jupyter Scala
Lightweight Scala kernel for Jupyter / IPython 3

DeepLearning.scala
A DSL for creating complex neural networks

Spark Google Spreadsheets
Google Spreadsheets datasource for SparkSQL and DataFrames

IndexR
A columnar storage format for fast & realtime analyze with big data.

Gago
Golang genetic algorithm library

Goml
On-line Machine Learning in Go (and so much more)

Gorgonia
A library that helps facilitate machine learning in Go.

Libsvm
A simple, easy-to-use, and efficient software for SVM classification and regression

Recurrent.js
Deep Recurrent Neural Networks and LSTMs in Javascript. More generally also arbitrary expression graphs with automatic differentiation.


Like to add your project? Any suggestion? Comment? Send your feedback to stkim1@pocketcluster.io or tweet @stkim1

Looking for more BigData or Machine Learning repositories? You can find a lot more tools, frameworks and libraries at PocketCluster Index.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

900+ Repositories

Readers have reached out and asked how many repositories the index site has. They also wondered why it shows small number of them.

index-screenshot

 

As of today, the index currently tracks over 900+ BigData and Machine Learning repositories and daily updates their status. It isn’t, however, exactly obvious from the first look, and the site design does not help either; lack of search, no project count, and/or few broken links here and there.

Although all “issues” cannot be fixed right away, we can definitely spare some time for few low-hanging, easy ones. Among those, infinite scroll issue is looked into today.

The site is to scroll infinitely all the way to end. You do not need to click next pages or indexes to move around. The feature is built in that way so you can scroll through the entire collection of 900+ repositories and pick most appealing ones as quickly as possible.

Nevertheless, it has few glitches. Firstly, it’s not easy to see if there are more pages to go.

pocketcluster-index-loading

For that, a spinning indicator is added at the bottom of page when the next page is loading and there are more to see.

Secondly, when you click a project and come back, the site loses its original position and refreshes its content. This is rather irritating, and it would be ideal to keep the context related to old position. Few hours of investigation reveals that it involves a bit more work than expected. (Here’s a good link you might like to read about it.)

It will be looked again (promises!). As of now, when a project is clicked, a new tab will be open for further detail and you can keep your old context for further search.

double-taps

Hope you like the update, and please let me know if you have further question!


Looking into adding your repo? Any suggestion? Comment? Send your feedback to stkim1@pocketcluster.io or tweet to @stkim1!

Looking for more BigData or Machine Learning repositories? You can find a lot more tools, frameworks and libraries at PocketCluster Index.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Dec. 29, 2016

Apache Edgent, an analytics framework on edge devices, has recently reached the 1.0.0 milestone. Also, Facebook has recently released Beringei – a new storage engine specifically for time-series data.

Since this is the last round-up for this year, I’d like to take an opportunity to thank you for visiting, subscribing, and reaching out to me for various suggestions and encouragement. Weekly round-up will continue next year. Stay tuned!

Examples

NBA Player Movement
Visualization and analysis of NBA player tracking data

Google Youtube-8M
Starter code for working with the YouTube-8M dataset

Toolsets

Sergeant
Tools to Transform and Query Data with the Apache Drill, REST API, JDBC Interface, dplyr, and DBI Interfaces in R

Content Data Store
A system to provide storage facilities to massive data sets is in the form of images, pdfs, documents and scanned documents

Sematext Solr-Researcher
Solr SearchComponent for altering and re-executing queries that product poor results

Naniar
Tools for numerical and visual summaries of NAs

Models

ByteNet
A tensorflow implementation of French-to-English machine translation using DeepMind’s ByteNet

Neural Painter
Paint artistic patterns using random neural network

VAE-Clustering
Unsupervised clustering with (Gaussian mixture) VAEs

Libraries

Fregata
A light weight, super fast, large scale machine learning library on apache spark

Intel pWord2Vec
Parallelizing word2vec in shared and distributed memory

Cortex
Machine learning in Clojure

Tulip Indicators
A library of functions for technical analysis of financial data

Genann
Simple neural network library in ANSI C

Frameworks

Facebook Beringei (incubating)
A high performance, in-memory storage engine for time series data.

Apache Edgent (incubating)
An open source stream processing programming model and lightweight micro-kernel style runtime for edge devices that enables you to analyze data and events at the device


*https://pocketcluster.wordpress.com will move to https://blog.pocketcluster.io on Jan 1, 2017. If you have subscribed this blog, please make sure to change the feed address.

Looking into adding your repo? Any suggestion? Comment? Send your feedback to stkim1@pocketcluster.io or tweet to @stkim1!

Looking for more BigData or Machine Learning repositories? You can find a lot more tools, frameworks and libraries at PocketCluster Index.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!