chrisjmccormick / MinHash
Example Python code for comparing documents using MinHash
☆251Updated 6 years ago
Alternatives and similar repositories for MinHash:
Users that are interested in MinHash are comparing it to the libraries listed below
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆284Updated last year
- LSH based high dimensional clustering for sets and points☆78Updated 10 years ago
- A pure python implementation of locality sensitive hashing for text documents☆85Updated 9 years ago
- Code for "Performance shootout between nearest-neighbour libraries": http://radimrehurek.com/2013/11/performance-shootout-of-nearest-neig…☆99Updated 9 years ago
- A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.☆146Updated 6 months ago
- Instructions & code for the EuroPython 2014 training session "Topic Modeling for Fun and Profit"☆110Updated 10 years ago
- Various gfx for a presentation at NYC ML meetup☆59Updated 9 years ago
- Python framework for fast (approximated) nearest neighbour search in large, high-dimensional data sets using different locality-sensitive…☆767Updated 2 years ago
- Topic modeling with gensim and LDA☆168Updated 7 years ago
- ☆214Updated 2 years ago
- Simhash and near-duplicate detection☆413Updated last year
- Implicit matrix factorization as outlined in http://yifanhu.net/PUB/cf.pdf.☆282Updated 8 years ago
- Weighted MinHash implementation on CUDA (multi-gpu).☆117Updated last year
- It is a forest of random projection trees☆223Updated 5 years ago
- All-pair set similarity search on millions of sets in Python and on a laptop☆593Updated 2 years ago
- pyxDamerauLevenshtein implements the Damerau-Levenshtein (DL) edit distance algorithm for Python in Cython for high performance.☆245Updated 10 months ago
- From Zero to Learning to Rank in Apache Solr☆181Updated 4 years ago
- An efficient simhash implementation for python☆124Updated 5 years ago
- Some add-on modules to networkx library☆78Updated 4 years ago
- Twitter hashtag prediction☆281Updated 7 years ago
- ClickModels is a small set of Python scripts for the user click models initially developed at Yandex. A Click Model is a probabilistic gr…☆240Updated 6 years ago
- A fast implementation of GloVe, with optional retrofitting☆243Updated last year
- Python port of Mikolov's word2phrase.c from the word2vec toolkit☆111Updated 4 years ago
- Automatically exported from code.google.com/p/jforests☆67Updated 4 years ago
- Train a Word2Vec model or LSA model, and Implement Conceptual Search\Semantic Search in Solr\Lucene - Simon Hughes Dice.com, Dice Tech Jo…☆257Updated 5 years ago
- Automatic labeling for topic model☆57Updated 9 years ago
- Collaborative modeling for recommendation. Implements variational inference for a collaborative topic models. These models recommend item…☆147Updated 9 years ago
- Latent Dirichlet Allocation (LDA) model for Microblogs (Twitter, weibo etc.)☆320Updated 6 years ago
- A library for k-nearest neighbor search☆384Updated 10 months ago
- Scalable Topic Modeling using Variational Inference in MapReduce☆150Updated 9 years ago