lintool / bigdata-2016w
CS 489/698 Big Data Infrastructure (Winter 2016) at the University of Waterloo
☆38Updated 8 years ago
Related projects: ⓘ
- Distributed Matrix Library☆70Updated 7 years ago
- Yggdrasil: Faster Decision Trees Using Column Partitioning in Spark☆31Updated 6 years ago
- My entry to the Kaggle 2012 Stack Overflow competition. Ranked 10th on the final public leaderboard.☆46Updated 8 years ago
- ☆75Updated this week
- An R-like GLM package for Apache Spark☆10Updated 9 years ago
- Repo for experiments on pyspark and sklearn☆79Updated 10 years ago
- Code to create benchmarks for Kaggle's Facebook Recruiting Competition☆86Updated 12 years ago
- My winning solution for Kaggle Higgs Machine Learning Challenge (single classifier, xgboost)☆81Updated 9 years ago
- Latency numbers every data scientist should know (aka the pyramid of analytical tasks) - the order of magnitude of computational time for…☆20Updated 7 years ago
- Spark implementation of the Google Correlate algorithm to quickly find highly correlated vectors in huge datasets☆91Updated 8 years ago
- A primal-dual framework for distributed L1-regularized optimization☆35Updated 8 years ago
- Pydata NYC 2014 Scikit Learn Tutorial☆64Updated 9 years ago
- ☆46Updated 7 years ago
- ☆37Updated this week
- CS 489/698 Big Data Infrastructure (Winter 2017) at the University of Waterloo☆15Updated 7 years ago
- This repository contains code files specifically IPython notebooks for the assignments in the course "Scalable Machine Learning" by UC Be…☆30Updated 9 years ago
- Assignments of CS190.1x, Scalable Machine Learning☆18Updated 9 years ago
- GPU Acceleration for Apache Spark☆34Updated 9 years ago
- Demo code contrasting Google Dataflow (Apache Beam) with Apache Spark☆14Updated 8 years ago
- An API for Distributed Machine Learning☆154Updated 7 years ago
- Creates models to classify documents into categories☆66Updated 6 years ago
- ☆17Updated 2 years ago
- ☆72Updated this week
- Predicting closed questions on Stack Overflow☆46Updated 6 years ago
- Slides for quick intro to machine learning with sklearn☆65Updated 10 years ago
- Testing framework for Collaborative Filtering☆38Updated 9 years ago
- Spark library for doing exploratory data analysis in a scalable way☆43Updated 8 years ago
- Recommender system that implements Simon Funk's iterative and approximation of Singular Value Decomposition made popular from the Netflix…☆10Updated 8 years ago
- Distributed Streaming Quantiles (for PySpark)☆37Updated 10 years ago
- Spark MLlib code optimized to efficiently support sparse data☆50Updated 7 years ago