Weekly BigData & ML Roundup – Mar. 23, 2017

LLVM for Data Processing
When Apache Spark launched Tungsten, there was a hint of incorporating LLVM into data processing pipeline to make use of modern CPU features and its superb machine code for performance boost.

LLVM appears again with Weld, a new code generation project for data analytics. The project claims that Tensorflow, Spark, and Numpy can be accelerated up to 30x with just few operations of Weld!

Interestingly, Matei Zaharia is in its contributor list.

Examples

Algorithmic Trading Pipeline
Algorithmic Trading Pipeline for Online Betting Markets

Blackbird
Blackbird Bitcoin Arbitrage: a long/short market-neutral strategy

Toolsets

Weld
Weld is a runtime and language for accelerating data analytics frameworks

bq-utils
Utitilties for BigQuery such as downloading table / query to csv/ndjson/excel/gsheet or new table using iterators for a low memory footprint.

Wiki2Vec
Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps

ETL Starter Kit
Extract, Transform, Load (ETL) refers to a process in database usage and especially in data warehousing.

Facebook Visdom
A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy.

Vivitics Node (VNode)
Vivitics Node (VNode): A workbench for Data Science powered by Jupyter and Docker

Models

Mozilla DeepSpeech
A TensorFlow implementation of Baidu’s DeepSpeech architecture

DiscoGAN
Official implementation of “Learning to Discover Cross-Domain Relations with Generative Adversarial Networks”

Libraries

Redis-ML
Machine Learning Model Server

UC Berkeley Ray
An experimental distributed execution engine


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Mar. 16, 2017

Apache released Ignite v1.9.0 (a high performance in-memory computation platform), and UIMA Ruta Workbench v2.6.0 (Unstructured Information Management Architecture).

Examples

Face Recognition
The world’s simplest facial recognition api for Python and the command line

Pytorch Tutorial
Tutorial for researchers to learn deep learning with pytorch

Deep Learning For NLP In Pytorch
An IPython Notebook tutorial on deep learning for natural language processing, including structure prediction.

Data Science, Machine Learning, Artificial Intelligence, Big Data, and IoT Resources
A curated set of resources for data science, machine learning, artificial intelligence (AI), big data, internet of things (IoT), and more.

Backpropagation
Using Java Swing to implement multilayer perceptron neural network which uses backpropagation algorithm to learn.

Toolset

seq2seq
A general-purpose encoder-decoder framework for Tensorflow

Models

Genetic CNN
CNN architecture exploration using Genetic Algorithm

TensorFlow Models
Implementations of autoencoder, generative adversarial networks, variational autoencoder and adversarial variational autoencoder

Clickbaits Revisited
Deep learning models to identify clickbaits taking content into consideration

Libraries

ApexNLP
A natural language event parser for java and android.

Node Yolo
Node bindings for YOLO/Darknet image recognition library

postagga
A Library to parse natural language in pure Clojure and ClojureScript


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Mar. 9, 2017

Apache Accumulo, a distributed key/value store, is updated to v1.8.1, and
Apache Kafka, a distributed messaging framework, v0.10.2.0 is released.

Examples

RDD-DF-DS-SSQL
Examples of and differences between various Spark APIs

TF Stanford Tutorials
This repository contains code examples for the course CS 20SI: TensorFlow for Deep Learning Research.

Awesome Sentiment Analysis
A curated list of Sentiment Analysis methods, implementations and misc

Adversarial Nets Papers
The classical papers and codes about generative adversarial nets

Papers-I-read
A Paper A Week

Toolset

Bowtie
Create a dashboard with python!

Models

sentencepiece
An unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training

lexvec
An implementation of the LexVec word embedding model that achieves state of the art results in multiple NLP tasks

FaceRecognition
Implement face recognition using PCA, LDA and LPP

rwa
Machine Learning on Sequential Data Using a Recurrent Weighted Average

Libraries

Faiss
A library for efficient similarity search and clustering of dense vectors.

Tablesaw
The simplest way to slice data in Java

Laurae
Advanced High Performance Data Science Toolbox for R by Laurae

 


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Mar. 2, 2017

Examples

And the award goes to…
Oscar prediction with classification algorithms

Pandas Tutorial
Tutorial on Using Pandas

Practical RL
A course in reinforcement learning in the wild

blupig
A serious Gomoku board game AI written in C++

CarND Vehicle Detection
Vehicle Tracking and Detection Project Submitted for Udacity’s CND using Traditional Computer Vision and Machine Learning Techniques

Election History
US Presidential Elections since 1789

Toolsets

Highcharter
R wrapper for highcharts based on htmlwidgets

Airflow Scheduler Failover Controller
A process that runs in unison with Apache Airflow to control the Scheduler process to ensure High Availability

miller
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

Faker
Faker is a Python package that generates fake data for you

Deep Video Analytics
Analyze videos & images, perform detections, index frames & detected objects, search by examples

Models

ML-From-Scratch
Bare bones Python implementations of some of the foundational Machine Learning models and algorithms

deconvfaces
Generating faces with deconvolution networks

fast-neural-style-tensorflow
A tensorflow implementation for fast neural style!

Libraries

Prophet: Automatic Forecasting Procedure
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth

cucco
Python library for text normalization

xtensor
Multi-dimensional arrays with broadcasting and lazy computing

Kadot
Kadot, the unsupervised natural language processing library

caffe-tensorflow
Caffe models in TensorFlow

neurojs
A javascript deep learning and reinforcement learning library

Computational Healthcare
Analyze large healthcare datasets & build machine learning models using TensorFlow


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Feb. 23, 2017

Three Apache Projects Updated

Apache Geode, a data management platform, 1.1.0 released.
Apache SINGA, a general distributed deep learning platform, 1.1.0 released.
Apache Storm, a realtime data processing system, 1.0.3 released.

Examples

basic_deep_learning_keras
Your first deep neural network in less than 5 minutes

painters
Winning solution for the Painter by Numbers competition on Kaggle

tiefvision
End-to-end deep learning image-similarity search engine

AdversarialNetsPapers
The classical papers and codes about generative adversarial nets

data-science-ipython-notebooks
Continually updated data science Python notebooks: Deep learning, scikit-learn, Kaggle, big data, matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines

A.I. Duet
A piano that responds to you.

Toolsets

elasticsearch-learning-to-rank
Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch

incubator-airflow
Airflow is a platform to programmatically author, schedule and monitor workflows.

ipyvolume
3d plotting for Python in the Jupyter notebook based on IPython widgets using WebGL

scikit-plot
An intuitive library to add plotting functionality to scikit-learn objects.

CatterPlots
Did you ever wish you could make scatter plots with cat shaped points? Now you can!

Models

DeepLearningImplementations
Implementation of recent Deep Learning papers

parrot
RNN-based generative models for speech.

pix2pix-tensorflow
Tensorflow port of Image-to-Image Translation with Conditional Adversarial Nets

Libraries

bootstrapped
Generate bootstrapped confidence intervals for A/B testing in Python.

SHMArrays
Read and write numpy arrays with multiple processes stored as shared memory

hadoop-binary-analysis
Framework that makes processing arbitrary binary data in Hadoop easier

mljar-api-python
A simple python wrapper over MLJAR API.

convnetjs
Deep Learning in Javascript. Train Convolutional Neural Networks (or ordinary ones) in your browser.

TensorFlow ecosystem
Integration of TensorFlow with other open-source frameworks

Framework

Netflix genie
Federated Big Data Orchestration Service


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Feb. 16, 2017

Apache Ranger, a Big Data security management framework for Hadoop, is promoted as a Top-Level Apache Project.

Examples

awesome-deep-learning-papers
Awesome – Most Cited Deep Learning Papers

practical-pytorch
Practical PyTorch tutorials, focused on using neural networks for natural language tasks

ml-playground
Place to toy around with different ML models

Numerical-Analysis-Examples
Numerical Analysis Implementations in Various Languages

Toolsets

AirSim
Open source simulator based on Unreal Engine for autonomous vehicles from Microsoft AI & Research

ChosunTruck
Euro Truck Simulator 2 autonomous driving solution

ipdb
Integration of IPython pdb

Bella
A pure python, post-exploitation, data mining tool and remote administration tool for macOS

Libraries

TensorFlowOnSpark
TensorFlowOnSpark brings TensorFlow programs onto Apache Spark clusters

glow
Glow is an easy-to-use distributed computation system written in Go, similar to Hadoop Map Reduce

gleam
Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly

dlib
A toolkit for making real world machine learning and data analysis applications in C++

flint
A Time Series Library for Apache Spark

kmcuda
Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA

Frameworks

ranger
Ranger is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform

thrill
An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly BigData & ML Roundup – Feb. 9, 2017

Apache Bahir, distributed analytic extensions for Spark and Flink, v2.0.2 is released.

Awesome Network Analysis and DataScienceR directly point to Github repositories as their readme are the core contents.

Examples

MLPB
Machine Learning Problem Bible

self-driving-car-sim
A self-driving car simulator built with Unity

lectures
Oxford Deep NLP 2017 course

gt-nlp-class
Course materials for Georgia Tech CS 4650 and 7650, “Natural Language”

Awesome Network Analysis
A curated list of awesome network analysis resources

DataScienceR
a curated list of R tutorials for Data Science, NLP and Machine Learning

Models

TensorFlow Fold
Deep learning with dynamic computation graphs in TensorFlow

darkflow
Real-time object detection and classification

ResNeXt
Implementation of a classification framework from the paper Aggregated Residual Transformations for Deep Neural Networks

Libraries

MITIE
library and tools for information extraction

nmtpy
A suite of Python tools for training neural machine translation networks using Theano

renjin
JVM-based interpreter for the R language for the statistical analysis.

reticulate
R Interface to Python

tulipy
Python bindings for Tulip Indicators

DeepDarkFantasy
What if we combine Functional Programming and Deep Learning?

ScenicOverlook
A Python library for incremental, in-memory map-reduces


1000+ tools, frameworks and libraries collected at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!