rahularora / MinHash
Estimating how similar are two sets using MinHash (Jaccard similarity coefficient)
☆30Updated 12 years ago
Alternatives and similar repositories for MinHash:
Users that are interested in MinHash are comparing it to the libraries listed below
- A fast Python implementation of locality sensitive hashing.☆70Updated 10 years ago
- A pure python implementation of locality sensitive hashing for text documents☆85Updated 9 years ago
- LSH based high dimensional clustering for sets and points☆79Updated 10 years ago
- POC IDS anomaly detection engine built with iPython notebook, matplotlib, pandas, numpy, scikit-learn, d3.js, hyperloglog implementation,…☆79Updated 10 years ago
- Simple approximate-nearest-neighbours in Python using locality sensitive hashing.☆140Updated 12 years ago
- feng - feature engineering for machine-learning champions☆27Updated 8 years ago
- Probabilistic Data Structures in Python (originally presented at PyData 2013)☆55Updated 3 years ago
- A project for clustering text streams using locality-sensitive hashing (LSH) in Python☆26Updated 13 years ago
- Topological Anomaly Detection (TAD) per Gartley and Basener 2009☆69Updated 4 years ago
- an implemetation of LDA in Python, from Heinrich's paper : http://www.arbylon.net/publications/text-est.pdf☆43Updated 15 years ago
- Wabbit Wappa is a full-featured Python wrapper for the Vowpal Wabbit machine learning utility.☆101Updated 7 years ago
- ☆24Updated 6 years ago
- FluRS: A Python library for streaming recommendation algorithms☆109Updated 3 years ago
- locality sensitive hashing☆71Updated 12 years ago
- A RESTful web service that runs microtasks across multiple crowds, provides quality control techniques, and is easily extensible.☆51Updated 7 years ago
- Implementation of Bayesian Sets for fast similarity searches.☆14Updated 13 years ago
- SmallK: very fast data clustering tools☆14Updated 6 years ago
- Code for "Performance shootout between nearest-neighbour libraries": http://radimrehurek.com/2013/11/performance-shootout-of-nearest-neig…☆99Updated 9 years ago
- Preparing DMOZ dataset for my n-Gram LM-based URL classification research☆32Updated 10 years ago
- Recommender System Framework☆125Updated 8 years ago
- Clustering documents based on LSH☆14Updated 9 years ago
- This project contains the code to translate between Apache Spark and SFrame.☆20Updated 8 years ago
- CrowdRec reference framework☆32Updated 8 years ago
- C++ Ternary Search Tree implementation with Python bindings☆43Updated 7 years ago
- Official repository of Quickscorer: a fast algorithm to rank documents with additive ensembles of regression trees.☆18Updated 8 years ago
- ☆39Updated 8 years ago
- google all pairs similarity search package, with swig bindings☆22Updated 10 years ago
- ☆27Updated 9 years ago
- ☆26Updated 8 years ago
- Weighted MinHash implementation on CUDA (multi-gpu).☆117Updated last year