kalaidin / sketches
HyperLogLog and other probabilistic data structures for mining in data streams
☆15Updated 9 years ago
Related projects ⓘ
Alternatives and complementary repositories for sketches
- Yet another regression toolkit☆12Updated 11 years ago
- Final project for COS 521: Using Hokusai algorithm to approximate frequency counts of hashtags in twitter data stream.☆12Updated 9 years ago
- Large scale matrix factorization on GPU☆19Updated 8 years ago
- Implementation of an algorithm computing the nearest "N" neighbours to a vector, using a collection of hyperplane hashers.☆30Updated 9 years ago
- Code for the Avito competition☆16Updated 10 years ago
- Regularized latent variable mixed membership modeling☆13Updated 11 years ago
- How to use automatic polynomial features and neural network mode in VW☆17Updated 10 years ago
- Predicting sales with Pandas☆15Updated 9 years ago
- The notes and slides from my PyCon Ireland 2016 PyData talk an introduction to gradient boosting☆18Updated 8 years ago
- Topic analysis using RSM or PVDM.☆11Updated 10 years ago
- My 2nd place submission (working with Kevin Goetsch) out of 28 teams at the Kaggle competition at PyCon2015.☆23Updated 9 years ago
- ☆10Updated 9 years ago
- Healthcare Twitter Analysis☆26Updated 8 years ago
- Dirichlet process mixture model (DPMM) for datamicroscopes☆12Updated 9 years ago
- scikit-learn addon to operate on set/"group"-based features☆41Updated 8 years ago
- Material and slides for Boston NLP meetup May 23rd 2016☆17Updated 8 years ago
- GSOC 2017 - Apache Organization - # Implementation of Factorization Machines on Spark using parallel stochastic gradient descent (python…☆15Updated 7 years ago
- Sklearn implementation of GBM to predict mu(X) and std(X) on heteroscedastic data☆27Updated 8 years ago
- A collections of metrics and loss functions written in Theano.☆14Updated 8 years ago
- Gaussian Process Regression for Python/Numpy☆14Updated 8 years ago
- Source code for exploring MLlib blog post☆11Updated 9 years ago
- 4th Place Solution for The Hunt for Prohibited Content Competition on Kaggle (http://www.kaggle.com/c/avito-prohibited-content)☆29Updated 10 years ago
- kaggle allen ai competition☆17Updated 8 years ago
- Starter kit for getting started in the NIPS 2017 Criteo Ad Placement Challenge☆19Updated 7 years ago
- Document or binary file vectorization with Normalized Compression Distance in Python.☆16Updated 9 years ago
- Repo for experiments on pyspark and sklearn☆79Updated 10 years ago
- A potential 22nd rank solution to Criteo Labs Display Advertising Challenge on Kaggle☆26Updated 7 years ago
- hacky exploratory variants on NN language models☆9Updated 9 years ago
- ☆26Updated 8 years ago