kalaidin / sketchesLinks
HyperLogLog and other probabilistic data structures for mining in data streams
☆14Updated 10 years ago
Alternatives and similar repositories for sketches
Users that are interested in sketches are comparing it to the libraries listed below
Sorting:
- Final project for COS 521: Using Hokusai algorithm to approximate frequency counts of hashtags in twitter data stream.☆12Updated 10 years ago
- My 2nd place submission (working with Kevin Goetsch) out of 28 teams at the Kaggle competition at PyCon2015.☆23Updated 10 years ago
- Probabilistic Data Structures in Python (originally presented at PyData 2013)☆55Updated 3 years ago
- Yet another regression toolkit☆12Updated 11 years ago
- Predicting closed questions on Stack Overflow☆44Updated 7 years ago
- ☆18Updated 9 years ago
- Regularized latent variable mixed membership modeling☆13Updated 11 years ago
- Large scale matrix factorization on GPU☆19Updated 9 years ago
- Healthcare Twitter Analysis☆26Updated 9 years ago
- Implementation of Bayesian Sets for fast similarity searches.☆14Updated 13 years ago
- kaggle allen ai competition☆17Updated 9 years ago
- ☆49Updated 7 years ago
- Material and slides for Boston NLP meetup May 23rd 2016☆17Updated 9 years ago
- Implementation of an algorithm computing the nearest "N" neighbours to a vector, using a collection of hyperplane hashers.☆30Updated 9 years ago
- Code for the "Burn CPU, burn" competition at Kaggle. Uses Extreme Learning Machines and hyperopt.☆33Updated 10 years ago
- Source code for exploring MLlib blog post☆11Updated 10 years ago
- Second-ranked solution to the Kaggle "Flavours of Physics" competition☆25Updated 9 years ago
- ☆12Updated 9 years ago
- Document or binary file vectorization with Normalized Compression Distance in Python.☆17Updated 9 years ago
- Dionis predictors blender☆10Updated 9 years ago
- ☆25Updated 9 years ago
- Datasets and notebooks☆13Updated 8 years ago
- hacky exploratory variants on NN language models☆9Updated 9 years ago
- Predicting sales with Pandas☆15Updated 9 years ago
- In-database parallel grid-search for XGBoost on Greenplum☆15Updated 7 years ago
- The notes and slides from my PyCon Ireland 2016 PyData talk an introduction to gradient boosting☆18Updated 8 years ago
- Repo for experiments on pyspark and sklearn☆79Updated 11 years ago
- A set of methods that predict the future values of popularity indices for news posts using a variety of features.☆33Updated 7 years ago
- Dato/Turi DS Conf talk on NLP and Elasticsearch analysis of reviews, plus JS implementation☆45Updated 8 years ago
- Kaggle competition results☆20Updated 6 years ago