Weekly BigData Roundup – July 15, 2016

We can pretty much sum up this week with two highlights; Amazon Scalable Tensor Network Engine DSSTNE and Yahoo Massively Parallel ADMM over Spark. Come check it out!


Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Yahoo Spark ADMM
A massively parallel abstract programming framework for solving big data optimization problems through ADMM over Spark

Shiny Server
Host Shiny applications over the web.


Fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow.

SparklingPandas aims to make it easy to use the distributed computing power of PySpark to scale your data analysis with Pandas.

Elassandra is a fork of Elasticsearch modified to run on top of Apache Cassandra in a scalable and resilient peer-to-peer architecture.

Spark DateTime Library
A library for exposing dateTime functions from the joda time library as SQL functions. Also provide a dsl for dateTime catylst expressions; this utilizes the scala wrapper library.

Apache Spark Renjin Executor (REX)
REX is an Apache Spark package offering access to the scientific computing power of the R programming language to Spark batch and streaming applications on the JVM.

Analytics Integration, Naturally.

Pure JavaScript implementation of the Avro specification.

A prototype of Hive UDFs/UDTFs that execute nested SQL queries within rows.


Web-based, polyglot research notebook platform

Easy interactive web applications with R

An IPython notebook explaining generalized linear models, particuarly for count data.

This is a common place for simple tools that the field engineering team @ Hortonworks

Holman Spark
Sparklines for your shell.

Kafka MySQL Connector
A plugin to the Kafka Connect framework that replicates data from MySQL to Kafka.

Kafka Connect Cassandra
Kafka Connect Cassandra Connector. This project includes source/sink connectors for Cassandra to/from Kafka.


Spark with Avro and Parquet
Enclosed is a simple Spark app demonstrating how to read and write data in the Parquet and Avro formats.


Dask and Scikit-Learn
Model Parallelism

Strangeloop2015 Articles
Architectural patterns of resilient distributed systems.

You can find a lot more tools, frameworks and libraries at PocketCluster Index. Go check it out! Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s