Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
☆290Jun 11, 2023Updated 3 years ago
Alternatives and similar repositories for LSH
Users that are interested in LSH are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW☆2,939Updated this week
- Example Python code for comparing documents using MinHash☆252Feb 11, 2019Updated 7 years ago
- A pure python implementation of locality sensitive hashing for text documents☆87Oct 24, 2015Updated 10 years ago
- A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.☆149Sep 4, 2024Updated last year
- A fast Python implementation of locality sensitive hashing.☆677Apr 30, 2020Updated 6 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Clustering documents based on LSH☆14Apr 20, 2016Updated 10 years ago
- A simple implementation of locality sensitive hashing in python☆25Feb 4, 2017Updated 9 years ago
- ☆32Nov 15, 2017Updated 8 years ago
- SuperMinHash: A New Minwise Hashing Algorithm for Jaccard Similarity Estimation, Simhash and SimhashIndex☆19Nov 18, 2022Updated 3 years ago
- Weighted MinHash implementation on CUDA (multi-gpu).☆122Nov 29, 2023Updated 2 years ago
- ☆13Feb 11, 2019Updated 7 years ago
- Fast fuzzy text search☆12May 16, 2023Updated 3 years ago
- Simhashing in C++☆136Feb 14, 2023Updated 3 years ago
- Python API for Various DB-Backed Simhash Clusters☆64Mar 16, 2017Updated 9 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Locality Sensitive Hashing for semantic similarity (Python 3.x)☆15Jun 8, 2018Updated 8 years ago
- Web content extraction using machine learning☆34Mar 3, 2021Updated 5 years ago
- A small utility for converting Stanford GloVe vectors to HDF5 / NumPy☆12Apr 4, 2017Updated 9 years ago
- Collection of some algorithms for entity resolution☆28Sep 7, 2015Updated 10 years ago
- Materials for the Neural Network tutorial at PyData NYC 2019☆15Feb 15, 2023Updated 3 years ago
- Python framework for fast (approximated) nearest neighbour search in large, high-dimensional data sets using different locality-sensitive…☆772Feb 23, 2023Updated 3 years ago
- Locality Sensitive Hashing☆80May 29, 2026Updated 3 weeks ago
- Neural LSH [ICLR 2020] - Using supervised learning to produce better space partitions for fast nearest neighbor search.☆73Jan 20, 2021Updated 5 years ago
- Open Source Implementation of Simhash in Python☆24Sep 14, 2017Updated 8 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.☆4,482Jul 29, 2025Updated 11 months ago
- 局部敏感哈希算法进行论文的相似性比对☆17Apr 11, 2019Updated 7 years ago
- A lightweight command line interface for the management of arbitrary machine learning tasks☆19Jan 29, 2021Updated 5 years ago
- Detecting near duplicates usign Moses Charikars Algorithm☆20Apr 27, 2026Updated 2 months ago
- All-pair set similarity search on millions of sets in Python and on a laptop☆603Oct 11, 2022Updated 3 years ago
- A Python Implementation of Simhash Algorithm☆1,037Mar 24, 2022Updated 4 years ago
- A Java implementation of Locality Sensitive Hashing (LSH)☆298Nov 19, 2022Updated 3 years ago
- Python wrapper around SVDLIBC, a fast library for sparse Singular Value Decomposition☆55Aug 16, 2013Updated 12 years ago
- A maximum-strength name parser for record linkage.☆41Sep 3, 2025Updated 9 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Extractive and Compressive Neural Summarization Based on Summary State Representations (NAACL 2019)☆16May 12, 2020Updated 6 years ago
- Pretrained Biomedical Name Encoder☆15Jul 28, 2019Updated 6 years ago
- A simple tool for small scale experiments using bayesian optimization☆35Aug 14, 2018Updated 7 years ago
- Parallel Semi-Supervised Latent Dirichlet Allocation☆33Jan 21, 2022Updated 4 years ago
- A Cython implementation of the affine gap string distance☆57Jan 23, 2023Updated 3 years ago
- Search 'from' and 'to' strings to learn a text cleaning mapping☆17Aug 29, 2015Updated 10 years ago
- This is the implementation code for the WWW2021 paper "Variation Control and Evaluation for Generative Slate Recommendation"☆15Jun 7, 2021Updated 5 years ago