Weekly roundup – Apr. 15, 2016


End-to-End, Real-time, Advanced Analytics Big Data Reference Pipeline.


Spark Streaming Blueprint apps
A spark sbt blueprint to build your own spark apps off of.


Streaming SQL for Apache Spark
Manipulate Spark-streaming by SQL.

Data Discovery and Lineage for Big Data Ecosystem.

Confluent REST Utils
Utilities and a small framework for building REST services with Jersey, Jackson, and Jetty.

Kafka Offset Monitor
A little app to monitor the progress of kafka consumers and their lag wrt the queue.

A powerful & scriptable shell for Apache ZooKeeper.

HQL (Apache Hive) query language support in Atom
Brings HQL (Apache Hive query) language support to Atom text editor. Works for SQL like languages and Pig Latin.

A modern Elasticsearch data browser.

A tool for running Spark on Google Compute Engine.


The Intel® Deep Learning Framework
IDLF is a SDK library for Deep Neural Networks training and execution.

A machine learning package built for humans.

Apache ORC
ORC is a self-describing type-aware columnar file format designed for Hadoop workloads.

A library to test Hive scripts with YARN and MR2.

Distributed System Integration & Performance Testing Library


You can find a lot more tools, frameworks and libraries at PocketCluster Index. Go check it out! Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s