chrisjmccormick / MinHash
Example Python code for comparing documents using MinHash
☆251Updated 6 years ago
Alternatives and similar repositories for MinHash:
Users that are interested in MinHash are comparing it to the libraries listed below
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆284Updated last year
- LSH based high dimensional clustering for sets and points☆79Updated 10 years ago
- Simhash and near-duplicate detection☆414Updated last year
- A pure python implementation of locality sensitive hashing for text documents☆85Updated 9 years ago
- An efficient simhash implementation for python☆124Updated 5 years ago
- A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.☆146Updated 7 months ago
- Estimating how similar are two sets using MinHash (Jaccard similarity coefficient)☆30Updated 12 years ago
- MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW☆2,671Updated 10 months ago
- Application of Locality Sensitive Hashing to Audio Fingerprinting☆59Updated 6 years ago
- A fast Python implementation of locality sensitive hashing.☆664Updated 4 years ago
- Text classification example in Python using Latent Semantic Analysis (LSA)☆105Updated 6 years ago
- Tools and recipes to train deep learning models and build services for NLP tasks such as text classification, semantic search ranking and…☆460Updated 6 years ago
- Various gfx for a presentation at NYC ML meetup☆59Updated 9 years ago
- A fast Python implementation of locality sensitive hashing.☆70Updated 10 years ago
- ☆189Updated 10 months ago
- From Zero to Learning to Rank in Apache Solr☆180Updated 4 years ago
- Train a Word2Vec model or LSA model, and Implement Conceptual Search\Semantic Search in Solr\Lucene - Simon Hughes Dice.com, Dice Tech Jo…☆256Updated 5 years ago
- It is a forest of random projection trees☆224Updated 5 years ago
- This is a C implementation of variational EM for latent Dirichlet allocation (LDA), a topic model for text or other discrete data.☆166Updated 8 years ago
- Deep Learning for Natural Language Processing☆458Updated 6 years ago
- A Python Implementation of Simhash Algorithm☆1,007Updated 3 years ago
- Collaborative modeling for recommendation. Implements variational inference for a collaborative topic models. These models recommend item…☆147Updated 9 years ago
- A library of learning to rank algorithms☆99Updated 4 years ago
- A fast and scalable C++ library for implicit-feedback matrix factorization models☆464Updated 2 years ago
- pyxDamerauLevenshtein implements the Damerau-Levenshtein (DL) edit distance algorithm for Python in Cython for high performance.☆246Updated 11 months ago
- A repository for Neural Document Ranking Models.☆84Updated 6 years ago
- Recommender System Framework☆125Updated 8 years ago
- Implicit matrix factorization as outlined in http://yifanhu.net/PUB/cf.pdf.☆283Updated 8 years ago
- Open Source Implementation of Simhash in Python☆24Updated 7 years ago
- A Learning to Rank Library☆135Updated 12 years ago