End-to-End, Real-time, Advanced Analytics Big Data Reference Pipeline.
Spark Streaming Blueprint apps
A spark sbt blueprint to build your own spark apps off of.
Streaming SQL for Apache Spark
Manipulate Spark-streaming by SQL.
Data Discovery and Lineage for Big Data Ecosystem.
Confluent REST Utils
Utilities and a small framework for building REST services with Jersey, Jackson, and Jetty.
Kafka Offset Monitor
A little app to monitor the progress of kafka consumers and their lag wrt the queue.
A powerful & scriptable shell for Apache ZooKeeper.
HQL (Apache Hive) query language support in Atom
Brings HQL (Apache Hive query) language support to Atom text editor. Works for SQL like languages and Pig Latin.
A modern Elasticsearch data browser.
A tool for running Spark on Google Compute Engine.
The Intel® Deep Learning Framework
IDLF is a SDK library for Deep Neural Networks training and execution.
A machine learning package built for humans.
ORC is a self-describing type-aware columnar file format designed for Hadoop workloads.
A library to test Hive scripts with YARN and MR2.
Distributed System Integration & Performance Testing Library
Subscribe for upcoming posts!
Join the channel!