Weekly BigData Roundup – May 5, 2016


CoreNLP wrapper for Spark
CoreNLP wraps Stanford CoreNLP annotation pipeline as an Apache Spark ML Transformer.

Distributed Machine Learning Common Codebase
A common bricks library for building scalable and portable distributed machine learning.

Matrix Shadow
Lightweight CPU/GPU Matrix and Tensor Template Library in C++/CUDA for (Deep) Machine Learning.

Smile (Statistical Machine Intelligence and Learning Engine)
Smile is a set of pure Java libraries of various state-of-art machine learning algorithms.



Avro RPC Quick Start
Apache Avro RPC Quick Start. Avro is a subproject of Apache Hadoop.

Cook Scheduler
Fair job scheduler on Mesos for batch workloads and Spark.

SVD Benchmarking
A repo for benchmarking distributed implementations of the singular value decomposition.


Anomaly Detector
A streaming anomaly detection system built with Oryx.

Fantasy Football
Choosing a fantasy football team using spark, hive, python, and really just about anything.

LSA of Legal Documents
Latent Semantic Analysis of Legal Documents.

Scikit-Learn Score Example
Example of applying a fit sklearn model to a distributed dataset using pyspark.

Sparkling Pandas Example
Examples of using SparklingPandas and Pandas with PySpark.


You can find a lot more tools, frameworks and libraries at PocketCluster Index. Go check it out! Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s