Example Python code for comparing documents using MinHash
☆251Feb 11, 2019Updated 7 years ago
Alternatives and similar repositories for MinHash
Users that are interested in MinHash are comparing it to the libraries listed below
Sorting:
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆293Jun 11, 2023Updated 2 years ago
- MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW☆2,882Jan 20, 2026Updated last month
- Estimating how similar are two sets using MinHash (Jaccard similarity coefficient)☆30Feb 4, 2013Updated 13 years ago
- ... just because nltk is too heavy☆35Jul 21, 2010Updated 15 years ago
- ☆13Feb 11, 2019Updated 7 years ago
- A pure python implementation of locality sensitive hashing for text documents☆87Oct 24, 2015Updated 10 years ago
- ☆32Nov 15, 2017Updated 8 years ago
- A project for clustering text streams using locality-sensitive hashing (LSH) in Python☆26Sep 23, 2011Updated 14 years ago
- An efficient simhash implementation for python☆128Oct 25, 2019Updated 6 years ago
- ☆12Feb 9, 2019Updated 7 years ago
- Creates a Lucene index out of files from a local folder☆13Aug 8, 2014Updated 11 years ago
- ☆12May 2, 2025Updated 10 months ago
- Code for "Incorporating Relevance Feedback for Information-Seeking Retrieval using Few-Shot Document Re-Ranking" (https://arxiv.org/abs/2…☆14Feb 2, 2026Updated last month
- Simhash and near-duplicate detection☆424May 15, 2023Updated 2 years ago
- Immutable (a.k.a. persistent or pure-functional) deque, set, and map data structures in portable Scheme.☆14Sep 14, 2020Updated 5 years ago
- Python code for implementing embeddings in the Wasserstein space of elliptical distributions☆11Jul 22, 2020Updated 5 years ago
- Common Lisp interface to D-Wave's Python Pack for adiabatic quantum computer energy programming☆11Jul 30, 2015Updated 10 years ago
- This repository contains the Framester resource, the main outcome of the framester project.☆33Oct 22, 2025Updated 4 months ago
- 6th Place Solution for the Google - Isolated Sign Language Recognition Kaggle Competition☆13May 4, 2023Updated 2 years ago
- Utility to translate NIF files across identifier schemes, such as DBpedia and Wikidata☆11Aug 24, 2019Updated 6 years ago
- Twitter data sets for Named Entity Extraction and Disambiguation☆17Jun 26, 2014Updated 11 years ago
- A lightweight python actor framework☆19Jan 29, 2016Updated 10 years ago
- ☆16Jul 23, 2023Updated 2 years ago
- Effect of tokenization on transformers for biological sequence☆22Dec 31, 2025Updated 2 months ago
- ☆20Jan 9, 2024Updated 2 years ago
- Data-structure for online/streaming clustering of non-stationary data.☆16Jul 15, 2016Updated 9 years ago
- Active Learning for text classification using scikit-learn☆24Jun 6, 2019Updated 6 years ago
- Example SPARQL queries, mostly for working with ZBW data sets☆16Oct 8, 2025Updated 5 months ago
- Text pattern search using marisa-trie☆18Jan 26, 2025Updated last year
- A toolkit for generating paraphrase vector representations for words in context☆23May 19, 2015Updated 10 years ago
- Weighted MinHash implementation on CUDA (multi-gpu).☆121Nov 29, 2023Updated 2 years ago
- Python package aiding in entity disambiguation based on string and location matching☆18Nov 2, 2023Updated 2 years ago
- LTR DNN in tensorflow, an improvement of DSSM☆21Oct 4, 2017Updated 8 years ago
- Sume is an implementation of the concept-based ILP model for summarization.☆37Aug 20, 2018Updated 7 years ago
- A DSL to build Lucene text queries in Python.☆38Jan 5, 2017Updated 9 years ago
- Transform unstructured document collections to structured Linked Data☆29Sep 12, 2025Updated 5 months ago
- A codenames bot playing the part of the spymaster.☆22Jan 7, 2018Updated 8 years ago
- Source code from my Master's thesis @Polytechnique Montréal. A solution to the assortment optimization problem, able to deal with large n…☆19Apr 27, 2017Updated 8 years ago
- Farsi spellchecker☆18Aug 30, 2017Updated 8 years ago