rahularora / MinHash
Estimating how similar are two sets using MinHash (Jaccard similarity coefficient)
☆30Updated 11 years ago
Related projects ⓘ
Alternatives and complementary repositories for MinHash
- A fast Python implementation of locality sensitive hashing.☆70Updated 9 years ago
- LSH based high dimensional clustering for sets and points☆79Updated 10 years ago
- A pure python implementation of locality sensitive hashing for text documents☆87Updated 9 years ago
- Probabilistic Data Structures in Python (originally presented at PyData 2013)☆55Updated 2 years ago
- This project contains the code to translate between Apache Spark and SFrame.☆21Updated 8 years ago
- ☆36Updated 10 years ago
- feng - feature engineering for machine-learning champions☆27Updated 7 years ago
- Simple approximate-nearest-neighbours in Python using locality sensitive hashing.☆140Updated 12 years ago
- Code for "Performance shootout between nearest-neighbour libraries": http://radimrehurek.com/2013/11/performance-shootout-of-nearest-neig…☆100Updated 9 years ago
- CrowdRec reference framework☆32Updated 7 years ago
- ☆24Updated 6 years ago
- Various gfx for a presentation at NYC ML meetup☆57Updated 9 years ago
- Topological Anomaly Detection (TAD) per Gartley and Basener 2009☆70Updated 4 years ago
- A RESTful web service that runs microtasks across multiple crowds, provides quality control techniques, and is easily extensible.☆51Updated 7 years ago
- 4th Place Solution for The Hunt for Prohibited Content Competition on Kaggle (http://www.kaggle.com/c/avito-prohibited-content)☆29Updated 10 years ago
- Yahoo!'s topic modelling framework using Latent Dirichlet Allocation☆98Updated 13 years ago
- ☆26Updated 7 years ago
- Wabbit Wappa is a full-featured Python wrapper for the Vowpal Wabbit machine learning utility.☆101Updated 7 years ago
- POC IDS anomaly detection engine built with iPython notebook, matplotlib, pandas, numpy, scikit-learn, d3.js, hyperloglog implementation,…☆78Updated 10 years ago
- A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.☆144Updated 2 months ago
- Automated NLP sentiment predictions- batteries included, or use your own data☆18Updated 6 years ago
- Parallel Iterative Algorithm (SGD) on Hadoop's YARN framework☆42Updated 11 years ago
- An Exploration into Graph Databases☆28Updated 9 years ago
- FluRS: A Python library for streaming recommendation algorithms☆108Updated 2 years ago
- locality sensitive hashing☆69Updated 12 years ago
- Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree☆31Updated 12 years ago
- My capstone project for Galvanize (Zipfian Academy)☆38Updated 5 years ago
- A Python wrapper for MADlib(http://madlib.net) - an open source library for scalable in-database machine learning algorithms☆63Updated 4 years ago
- Simple clustering library for python.☆65Updated 3 years ago