kiranvodrahalli / cos521
Final project for COS 521: Using Hokusai algorithm to approximate frequency counts of hashtags in twitter data stream.
☆12Updated 10 years ago
Alternatives and similar repositories for cos521:
Users that are interested in cos521 are comparing it to the libraries listed below
- HyperLogLog and other probabilistic data structures for mining in data streams☆14Updated 10 years ago
- Implementation of an algorithm computing the nearest "N" neighbours to a vector, using a collection of hyperplane hashers.☆30Updated 9 years ago
- Count-Min Tree Sketch: Approximate counting for NLP☆10Updated 7 years ago
- from zero to storm cluster for realtime classification using sklearn☆12Updated 10 years ago
- My 2nd place submission (working with Kevin Goetsch) out of 28 teams at the Kaggle competition at PyCon2015.☆23Updated 10 years ago
- Large scale matrix factorization on GPU☆19Updated 8 years ago
- Regularized latent variable mixed membership modeling☆13Updated 11 years ago
- Notes on Lambda Architecture☆12Updated 7 years ago
- Datasets and notebooks☆13Updated 8 years ago
- Probabilistic Data Structures in Python (originally presented at PyData 2013)☆55Updated 3 years ago
- High performance implementations of gradient boosting, random forests, etc. in Go☆61Updated 11 years ago
- A board game recommendation engine/model/website.☆39Updated 8 years ago
- Predicting sales with Pandas☆15Updated 9 years ago
- Machine Learning solution for Kaggle.com's "Partly Sunny with a Chance of Hashtags"☆27Updated 11 years ago
- A collection of documents and materials for the EMNLP-2015 Semantic Similarity tutorial☆30Updated 9 years ago
- A mulitarmed bandit to A/B test go projects, or other languages via an API.☆71Updated 11 years ago
- Python implementation of nonparametric nearest-neighbor-based estimators for divergences between distributions.☆48Updated 8 years ago
- This project contains the code to translate between Apache Spark and SFrame.☆20Updated 8 years ago
- Based on Thompson sampling with the online bootstrap (Dean Eckles, Maurits Kaptein). http://arxiv.org/abs/1410.4009☆11Updated 10 years ago
- Scalable inference for Correlated Topic Models☆30Updated 10 years ago
- Implementation of Bayesian Sets for fast similarity searches.☆14Updated 13 years ago
- brat rapid annotation tool (brat) - for all your textual annotation needs☆10Updated 7 years ago
- Dato/Turi DS Conf talk on NLP and Elasticsearch analysis of reviews, plus JS implementation☆45Updated 8 years ago
- Using Word2Vec on lists and sets☆34Updated 9 years ago
- Code for KDD 2014☆16Updated 9 years ago
- A parallel IRWLS library to solve SVMs and budgeted SVMs☆59Updated 7 years ago
- Document or binary file vectorization with Normalized Compression Distance in Python.☆17Updated 9 years ago
- Deep learning certificate part 1☆10Updated 3 years ago
- In-database parallel grid-search for XGBoost on Greenplum☆15Updated 7 years ago
- A startup search engine made using embeddings built on crunchbase company descriptions☆11Updated 9 years ago