Productivity-centric Python data analysis framework for SQL systems and the Hadoop platform. Co-founded by the creator of pandas.
Livy is an open source REST interface for interacting with Apache Spark from anywhere
Distributed computation in Python
A high performance implementation of HDBSCAN clustering.
Scalable machine learning library for Hive/Hadoop
A library of extension and helper modules for Python’s data analysis and machine learning libraries
Auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator
The MongoDB Spark Connector
Minio is an object storage server compatible with Amazon S3.
Torchnet is a framework for torch which provides a set of abstractions aiming at encouraging code re-use as well as encouraging modular programming.
A Python tool that automatically cleans data sets and readies them for analysis
The Deep Mining project aims at finding the best hyperparameter set for a Machine Learning pipeline.
General Assembly’s Data Science course material in Washington, DC
Practice your pandas skills!
Subscribe for upcoming posts!
Join the channel!