Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
☆293Jun 11, 2023Updated 2 years ago
Alternatives and similar repositories for LSH
Users that are interested in LSH are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW☆2,892Mar 25, 2026Updated last week
- Example Python code for comparing documents using MinHash☆251Feb 11, 2019Updated 7 years ago
- A pure python implementation of locality sensitive hashing for text documents☆87Oct 24, 2015Updated 10 years ago
- A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.☆150Sep 4, 2024Updated last year
- A fast Python implementation of locality sensitive hashing.☆674Apr 30, 2020Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- A simple implementation of locality sensitive hashing in python☆25Feb 4, 2017Updated 9 years ago
- ☆32Nov 15, 2017Updated 8 years ago
- Simhash and near-duplicate detection☆424May 15, 2023Updated 2 years ago
- SuperMinHash: A New Minwise Hashing Algorithm for Jaccard Similarity Estimation, Simhash and SimhashIndex☆19Nov 18, 2022Updated 3 years ago
- Weighted MinHash implementation on CUDA (multi-gpu).☆121Nov 29, 2023Updated 2 years ago
- ☆13Feb 11, 2019Updated 7 years ago
- Python API for Various DB-Backed Simhash Clusters☆64Mar 16, 2017Updated 9 years ago
- Locality Sensitive Hashing for semantic similarity (Python 3.x)☆15Jun 8, 2018Updated 7 years ago
- Web content extraction using machine learning☆34Mar 3, 2021Updated 5 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Collection of some algorithms for entity resolution☆28Sep 7, 2015Updated 10 years ago
- Locality-sensitive hashing algorithm for text similarity comparisons☆58Apr 9, 2025Updated 11 months ago
- Materials for the Neural Network tutorial at PyData NYC 2019☆15Feb 15, 2023Updated 3 years ago
- Python framework for fast (approximated) nearest neighbour search in large, high-dimensional data sets using different locality-sensitive…☆771Feb 23, 2023Updated 3 years ago
- Locality Sensitive Hashing☆80Jul 12, 2023Updated 2 years ago
- Open Source Implementation of Simhash in Python☆24Sep 14, 2017Updated 8 years ago
- A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.☆4,450Jul 29, 2025Updated 8 months ago
- Web archiving utility library☆11Mar 11, 2026Updated 3 weeks ago
- A lightweight command line interface for the management of arbitrary machine learning tasks☆19Jan 29, 2021Updated 5 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- a python library for parsing unstructured western names into name components.☆617May 15, 2025Updated 10 months ago
- Detecting near duplicates usign Moses Charikars Algorithm☆20Oct 7, 2014Updated 11 years ago
- All-pair set similarity search on millions of sets in Python and on a laptop☆603Oct 11, 2022Updated 3 years ago
- A Python Implementation of Simhash Algorithm☆1,036Mar 24, 2022Updated 4 years ago
- Python wrapper around SVDLIBC, a fast library for sparse Singular Value Decomposition☆55Aug 16, 2013Updated 12 years ago
- A maximum-strength name parser for record linkage.☆39Sep 3, 2025Updated 6 months ago
- Extractive and Compressive Neural Summarization Based on Summary State Representations (NAACL 2019)☆16May 12, 2020Updated 5 years ago
- The pipeline for the OSCAR corpus☆176Nov 9, 2025Updated 4 months ago
- Pretrained Biomedical Name Encoder☆15Jul 28, 2019Updated 6 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- A simple tool for small scale experiments using bayesian optimization☆35Aug 14, 2018Updated 7 years ago
- Parallel Semi-Supervised Latent Dirichlet Allocation☆33Jan 21, 2022Updated 4 years ago
- Search 'from' and 'to' strings to learn a text cleaning mapping☆17Aug 29, 2015Updated 10 years ago
- This is the implementation code for the WWW2021 paper "Variation Control and Evaluation for Generative Slate Recommendation"☆15Jun 7, 2021Updated 4 years ago
- Scalable String Similarity Joins in Python☆39Jul 12, 2024Updated last year
- Slides and scripts from Data in Bahia meetups☆13Dec 8, 2022Updated 3 years ago
- LSHDB is a parallel and distributed data engine, which relies on Locality-Sensitive Hashing and noSQL systems, for performing record link…☆32Aug 30, 2022Updated 3 years ago