lintool / bigdata-2016w
CS 489/698 Big Data Infrastructure (Winter 2016) at the University of Waterloo
☆39Updated 8 years ago
Related projects ⓘ
Alternatives and complementary repositories for bigdata-2016w
- Quickly start YARN cluster on EC2☆30Updated 7 years ago
- Yggdrasil: Faster Decision Trees Using Column Partitioning in Spark☆31Updated 6 years ago
- Scikit-learn quickstart tutorial for Webstep☆18Updated 7 years ago
- Public materials for the Fall 2016 offering of CS145☆35Updated 7 years ago
- 阅读论文备份☆17Updated 8 years ago
- Course homepages for courses that I've taught at the University of Maryland☆53Updated 8 years ago
- Scalable Distributed LDA implementation for Spark & Glint☆28Updated 8 years ago
- Repo for experiments on pyspark and sklearn☆79Updated 10 years ago
- Pydata NYC 2014 Scikit Learn Tutorial☆64Updated 9 years ago
- GPU Acceleration for Apache Spark☆34Updated 9 years ago
- Fast Ensembles of Sparse Trees☆38Updated 8 years ago
- Logistic regression engine for medium-sized data☆55Updated 9 years ago
- Distributed Matrix Library☆70Updated 7 years ago
- ☆24Updated 8 years ago
- Testing framework for Collaborative Filtering☆38Updated 9 years ago
- Latency numbers every data scientist should know (aka the pyramid of analytical tasks) - the order of magnitude of computational time for…☆20Updated 7 years ago
- Demo of random projections at BerlinBuzzwords 2015☆22Updated 4 years ago
- Ping the world!☆82Updated 9 years ago
- Notebooks (and slides) for my PyData NYC 2014 tutorial on the more advanced features of scikit-learn.☆69Updated 9 years ago
- Additional files for the Otto Group Challenge hosted by Kaggle☆36Updated 9 years ago
- Reference implementations of data-intensive algorithms in MapReduce and Spark☆82Updated 6 years ago
- Code for the Kaggle acquire valued shoppers challenge☆66Updated 10 years ago
- Second-ranked solution to the Kaggle "Flavours of Physics" competition☆24Updated 8 years ago
- PyData Madrid 2016 material for the talk: A Primer to recommendation Systems☆37Updated 8 years ago
- Source code for the tutorial series at http://www.thoughtly.co/blog/prototype☆32Updated 9 years ago
- A framework for building reranking models.☆29Updated 9 years ago
- Spark implementation of the Google Correlate algorithm to quickly find highly correlated vectors in huge datasets☆91Updated 8 years ago
- Spark MLlib code optimized to efficiently support sparse data☆50Updated 7 years ago
- Demo code contrasting Google Dataflow (Apache Beam) with Apache Spark☆14Updated 8 years ago
- ☆46Updated 7 years ago