chrisjmccormick / MinHash
Example Python code for comparing documents using MinHash
☆250Updated 5 years ago
Related projects ⓘ
Alternatives and complementary repositories for MinHash
- LSH based high dimensional clustering for sets and points☆79Updated 9 years ago
- Simhash and near-duplicate detection☆410Updated last year
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆282Updated last year
- A pure python implementation of locality sensitive hashing for text documents☆87Updated 9 years ago
- Tools and recipes to train deep learning models and build services for NLP tasks such as text classification, semantic search ranking and…☆461Updated 5 years ago
- Estimating how similar are two sets using MinHash (Jaccard similarity coefficient)☆30Updated 11 years ago
- Code for "Performance shootout between nearest-neighbour libraries": http://radimrehurek.com/2013/11/performance-shootout-of-nearest-neig…☆100Updated 9 years ago
- Collaborative modeling for recommendation. Implements variational inference for a collaborative topic models. These models recommend item…☆147Updated 9 years ago
- Various gfx for a presentation at NYC ML meetup☆57Updated 9 years ago
- ☆130Updated 3 years ago
- An efficient simhash implementation for python☆124Updated 5 years ago
- A comprehensive and scalable set of string tokenizers and similarity measures in Python☆137Updated 3 months ago
- RiVal recommender system evaluation toolkit☆151Updated 5 years ago
- Implicit matrix factorization as outlined in http://yifanhu.net/PUB/cf.pdf.☆282Updated 8 years ago
- Instructions & code for the EuroPython 2014 training session "Topic Modeling for Fun and Profit"☆110Updated 10 years ago
- Python framework for fast (approximated) nearest neighbour search in large, high-dimensional data sets using different locality-sensitive…☆766Updated last year
- Twitter named entity extraction for WNUT 2016 http://noisy-text.github.io/2016/ner-shared-task.html☆139Updated 2 years ago
- A news recommendation evaluation framework☆43Updated 6 years ago
- pyxDamerauLevenshtein implements the Damerau-Levenshtein (DL) edit distance algorithm for Python in Cython for high performance.☆243Updated 6 months ago
- Open Source Implementation of Simhash in Python☆24Updated 7 years ago
- Fast, DB Backed pretrained word embeddings for natural language processing.☆223Updated last year
- A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.☆142Updated 2 months ago
- FluRS: A Python library for streaming recommendation algorithms☆108Updated 2 years ago
- A fast Python implementation of locality sensitive hashing.☆70Updated 9 years ago
- This is a C implementation of variational EM for latent Dirichlet allocation (LDA), a topic model for text or other discrete data.☆166Updated 8 years ago
- Code & data accompanying the KDD 2017 paper "KATE: K-Competitive Autoencoder for Text"☆142Updated last year
- A Python Implementation of Simhash Algorithm☆980Updated 2 years ago
- Additive Groves, Bagged Trees with Feature Evaluation, Interaction Detection, Visualization of Feature Effects.☆66Updated 3 years ago
- Train a Word2Vec model or LSA model, and Implement Conceptual Search\Semantic Search in Solr\Lucene - Simon Hughes Dice.com, Dice Tech Jo…☆257Updated 5 years ago
- Weighted MinHash implementation on CUDA (multi-gpu).☆114Updated 11 months ago