gautamsm / data-science-on-mpp
A collection of examples illustrating data processing, data science, and machine learning on the Pivotal Greenplum and HAWQ MPP databases
☆20Updated 8 years ago
Related projects ⓘ
Alternatives and complementary repositories for data-science-on-mpp
- This project contains the code to translate between Apache Spark and SFrame.☆21Updated 8 years ago
- A simple python wrapper over MLJAR API.☆42Updated 2 years ago
- Tutorial for Deploying Anaconda Cluster and PySpark on top of Red Hat Storage GlusterFS☆8Updated 9 years ago
- feng - feature engineering for machine-learning champions☆27Updated 7 years ago
- Spark library for doing exploratory data analysis in a scalable way☆43Updated 8 years ago
- Demo code contrasting Google Dataflow (Apache Beam) with Apache Spark☆14Updated 8 years ago
- The code for the in memory data pipeline that was presented at Berlin Buzzwords 2015.☆10Updated 9 years ago
- ☆41Updated 7 years ago
- A library that allows serialization of SciKit-Learn estimators into PMML☆70Updated 5 years ago
- A simple example of containerized data science with python and Docker.☆51Updated 6 years ago
- My capstone project for Galvanize (Zipfian Academy)☆38Updated 5 years ago
- Docker container with a PyData stack and JupyterHub server☆37Updated 8 years ago
- Data science repo to help others☆12Updated 8 years ago
- Another, hopefully better, implementation of ALS on Spark☆14Updated 9 years ago
- Simplified tree-based classifier and regressor for interpretable machine learning (scikit-learn compatible)☆47Updated 3 years ago
- Utilities and examples to asssist in working with PySpark and Cassandra.☆36Updated 9 years ago
- Simple validator for submissions to DrivenData competitions☆19Updated 5 years ago
- Experimental parallel data analysis toolkit.☆120Updated 3 years ago
- Machine learning evaluation database☆24Updated 6 years ago
- Invoke Pandas plotting by piping in SQL output via PSQL (Can be used with Postgres or Greenplum or any SQL engine).☆16Updated 10 years ago
- Fast, easy and intuitive machine learning prototyping.☆124Updated 10 years ago
- Multidimensional data explorer and visualization tool.☆52Updated 7 years ago
- The slides, code examples and resources for the PyCon 2015 Ireland talk on building data pipelines☆13Updated 9 years ago
- Material and slides for Boston NLP meetup May 23rd 2016☆17Updated 8 years ago
- Repo for experiments on pyspark and sklearn☆79Updated 10 years ago
- ☆26Updated 8 years ago
- Tools for performing hyperparameter search with Scikit-Learn and Dask http://dask-searchcv.readthedocs.io☆11Updated 7 years ago