Starting from next week, Deep Learning stacks such as TensorFlow or Caffe and their related libraries and examples will appear as they become a vital part of machine learning pipeline where Big Data stacks are foundation.
The main focus of roundup is to provide summarized links to Github repository as it has been. There are other aggregators in the space who have done tremendous job on collecting Github links, academic papers, and comments from users. In order to respect their work and differentiate from them, the primary focus would direct toward those Deep Learning stacks that could work well with and amplify what Big Data frameworks bring on table.
Burrow – Kafka Consumer Lag Checking
Burrow is a monitoring companion for Apache Kafka that provides consumer lag checking as a service without the need for specifying thresholds.
StreamSets Data Collector is an enterprise grade, open source, continuous big data ingestion infrastructure.
API and command line interface for HDFS.
mlpack is an intuitive, fast, scalable C++ machine learning library, meant to be a machine learning analog to LAPACK.
EclairJS enables web applications and Jupyter Notebooks to work with Spark.
Photon Machine Learning (Photon-ML)
A scalable machine learning library on Apache Spark.
XGBoost eXtreme Gradient Boosting
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow.
A realtime distributed OLAP datastore.
Subscribe for upcoming posts!
Join the channel!