Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
☆293Jun 11, 2023Updated 2 years ago
Alternatives and similar repositories for LSH
Users that are interested in LSH are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW☆2,892Updated this week
- Example Python code for comparing documents using MinHash☆251Feb 11, 2019Updated 7 years ago
- A pure python implementation of locality sensitive hashing for text documents☆87Oct 24, 2015Updated 10 years ago
- A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.☆150Sep 4, 2024Updated last year
- A fast Python implementation of locality sensitive hashing.☆674Apr 30, 2020Updated 5 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Clustering documents based on LSH☆14Apr 20, 2016Updated 9 years ago
- A simple implementation of locality sensitive hashing in python☆25Feb 4, 2017Updated 9 years ago
- ☆32Nov 15, 2017Updated 8 years ago
- Cross-domain temporal information extractors: temporal expressions, events and temporal links.☆21Oct 29, 2015Updated 10 years ago
- Simhash and near-duplicate detection☆424May 15, 2023Updated 2 years ago
- Weighted MinHash implementation on CUDA (multi-gpu).☆121Nov 29, 2023Updated 2 years ago
- SuperMinHash: A New Minwise Hashing Algorithm for Jaccard Similarity Estimation, Simhash and SimhashIndex☆19Nov 18, 2022Updated 3 years ago
- ☆13Feb 11, 2019Updated 7 years ago
- Simhashing in C++☆136Feb 14, 2023Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Python wrapper for a C++ Double Metaphone☆15Jan 12, 2026Updated 2 months ago
- Python API for Various DB-Backed Simhash Clusters☆64Mar 16, 2017Updated 9 years ago
- Locality Sensitive Hashing for semantic similarity (Python 3.x)☆15Jun 8, 2018Updated 7 years ago
- Web content extraction using machine learning☆34Mar 3, 2021Updated 5 years ago
- A small utility for converting Stanford GloVe vectors to HDF5 / NumPy☆12Apr 4, 2017Updated 8 years ago
- Collection of some algorithms for entity resolution☆28Sep 7, 2015Updated 10 years ago
- Materials for the Neural Network tutorial at PyData NYC 2019☆15Feb 15, 2023Updated 3 years ago
- Neural LSH [ICLR 2020] - Using supervised learning to produce better space partitions for fast nearest neighbor search.☆73Jan 20, 2021Updated 5 years ago
- Open Source Implementation of Simhash in Python☆24Sep 14, 2017Updated 8 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.☆4,448Jul 29, 2025Updated 8 months ago
- 局部敏感哈希算法进行论文的相似性比对☆17Apr 11, 2019Updated 6 years ago
- A lightweight command line interface for the management of arbitrary machine learning tasks☆19Jan 29, 2021Updated 5 years ago
- a python library for parsing unstructured western names into name components.☆617May 15, 2025Updated 10 months ago
- All-pair set similarity search on millions of sets in Python and on a laptop☆604Oct 11, 2022Updated 3 years ago
- A Python Implementation of Simhash Algorithm☆1,036Mar 24, 2022Updated 4 years ago
- A Java implementation of Locality Sensitive Hashing (LSH)☆301Nov 19, 2022Updated 3 years ago
- Python wrapper around SVDLIBC, a fast library for sparse Singular Value Decomposition☆55Aug 16, 2013Updated 12 years ago
- A maximum-strength name parser for record linkage.☆39Sep 3, 2025Updated 6 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Extractive and Compressive Neural Summarization Based on Summary State Representations (NAACL 2019)☆16May 12, 2020Updated 5 years ago
- The pipeline for the OSCAR corpus☆176Nov 9, 2025Updated 4 months ago
- Pretrained Biomedical Name Encoder☆15Jul 28, 2019Updated 6 years ago
- A simple tool for small scale experiments using bayesian optimization☆35Aug 14, 2018Updated 7 years ago
- A Cython implementation of the affine gap string distance☆57Jan 23, 2023Updated 3 years ago
- Search 'from' and 'to' strings to learn a text cleaning mapping☆17Aug 29, 2015Updated 10 years ago
- This is the implementation code for the WWW2021 paper "Variation Control and Evaluation for Generative Slate Recommendation"☆15Jun 7, 2021Updated 4 years ago