chrisjmccormick / MinHashLinks
Example Python code for comparing documents using MinHash
☆251Updated 6 years ago
Alternatives and similar repositories for MinHash
Users that are interested in MinHash are comparing it to the libraries listed below
Sorting:
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆290Updated 2 years ago
- LSH based high dimensional clustering for sets and points☆79Updated 10 years ago
- A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.☆147Updated 10 months ago
- Simhash and near-duplicate detection☆416Updated 2 years ago
- Various gfx for a presentation at NYC ML meetup☆60Updated 9 years ago
- A pure python implementation of locality sensitive hashing for text documents☆85Updated 9 years ago
- Train a Word2Vec model or LSA model, and Implement Conceptual Search\Semantic Search in Solr\Lucene - Simon Hughes Dice.com, Dice Tech Jo…☆257Updated 6 years ago
- experiments and snippets used on the blog☆145Updated last year
- ClickModels is a small set of Python scripts for the user click models initially developed at Yandex. A Click Model is a probabilistic gr…☆239Updated 7 years ago
- CMU ARK Twitter Part-of-Speech Tagger☆575Updated last year
- Python framework for fast (approximated) nearest neighbour search in large, high-dimensional data sets using different locality-sensitive…☆767Updated 2 years ago
- Instructions & code for the EuroPython 2014 training session "Topic Modeling for Fun and Profit"☆110Updated 10 years ago
- From Zero to Learning to Rank in Apache Solr☆185Updated 4 years ago
- word2vec Google News model slimmed down to 300k English words☆216Updated 8 years ago
- Tools and recipes to train deep learning models and build services for NLP tasks such as text classification, semantic search ranking and…☆460Updated 6 years ago
- pyndri is a Python interface to the Indri search engine.☆89Updated 3 years ago
- Code for "Performance shootout between nearest-neighbour libraries": http://radimrehurek.com/2013/11/performance-shootout-of-nearest-neig…☆99Updated 10 years ago
- Additive Groves, Bagged Trees with Feature Evaluation, Interaction Detection, Visualization of Feature Effects.☆65Updated 4 years ago
- Language Detection with Infinity-gram☆230Updated 10 years ago
- 💫 Scripts, tools and resources for developing spaCy☆126Updated 6 years ago
- It is a forest of random projection trees☆223Updated 5 years ago
- A repository for Neural Document Ranking Models.☆84Updated 6 years ago
- Python code for detecting topics/events from a Twitter stream☆100Updated 6 years ago
- Python port of the Twokenize class of ark-tweet-nlp☆142Updated 7 years ago
- Automatic labeling for topic model☆56Updated 9 years ago
- Code & data accompanying the KDD 2017 paper "KATE: K-Competitive Autoencoder for Text"☆145Updated 2 years ago
- Estimating how similar are two sets using MinHash (Jaccard similarity coefficient)☆31Updated 12 years ago
- Twitter hashtag prediction☆281Updated 8 years ago
- Package for Statistically significant linguistic change☆56Updated 2 years ago
- ☆216Updated 3 years ago