chrisjmccormick / MinHash
Example Python code for comparing documents using MinHash
☆250Updated 5 years ago
Related projects ⓘ
Alternatives and complementary repositories for MinHash
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆282Updated last year
- LSH based high dimensional clustering for sets and points☆79Updated 10 years ago
- A pure python implementation of locality sensitive hashing for text documents☆87Updated 9 years ago
- A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.☆144Updated 2 months ago
- Simhash and near-duplicate detection☆411Updated last year
- Code for "Performance shootout between nearest-neighbour libraries": http://radimrehurek.com/2013/11/performance-shootout-of-nearest-neig…☆100Updated 9 years ago
- Instructions & code for the EuroPython 2014 training session "Topic Modeling for Fun and Profit"☆110Updated 10 years ago
- pyndri is a Python interface to the Indri search engine.☆89Updated 2 years ago
- Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby☆601Updated 6 years ago
- Automatically exported from code.google.com/p/berkeleylm☆98Updated 8 years ago
- Various gfx for a presentation at NYC ML meetup☆58Updated 9 years ago
- Palmetto is a quality measuring tool for topics☆215Updated 9 months ago
- Code & data accompanying the KDD 2017 paper "KATE: K-Competitive Autoencoder for Text"☆142Updated last year
- Weighted MinHash implementation on CUDA (multi-gpu).☆114Updated 11 months ago
- Estimating how similar are two sets using MinHash (Jaccard similarity coefficient)☆30Updated 11 years ago
- Text classification example in Python using Latent Semantic Analysis (LSA)☆104Updated 6 years ago
- A Learning to Rank Library☆135Updated 12 years ago
- Socially-Equitable Language Identification☆78Updated last year
- MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW☆2,586Updated 5 months ago
- pyxDamerauLevenshtein implements the Damerau-Levenshtein (DL) edit distance algorithm for Python in Cython for high performance.☆243Updated 6 months ago
- A fast Python implementation of locality sensitive hashing.☆70Updated 9 years ago
- An efficient simhash implementation for python☆125Updated 5 years ago
- ☆130Updated 3 years ago
- Source code for the paper "Web2Text: Deep Structured Boilerplate Removal", full paper @ ECIR'18☆168Updated 3 years ago
- Python port of the Twokenize class of ark-tweet-nlp☆141Updated 6 years ago
- Python Set subclass that supports searching by ngram similarity☆120Updated 3 years ago
- Python wrapper for Stanford CoreNLP☆353Updated 3 years ago
- Open Source Implementation of Simhash in Python☆24Updated 7 years ago
- Retrofitting Word Vectors to Semantic Lexicons☆374Updated 5 years ago
- ☆151Updated 4 years ago