Example Python code for comparing documents using MinHash
☆252Feb 11, 2019Updated 7 years ago
Alternatives and similar repositories for MinHash
Users that are interested in MinHash are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW☆2,916Apr 18, 2026Updated last month
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆291Jun 11, 2023Updated 2 years ago
- LSH based high dimensional clustering for sets and points☆80Nov 15, 2014Updated 11 years ago
- A pure python implementation of locality sensitive hashing for text documents☆87Oct 24, 2015Updated 10 years ago
- ☆13Feb 11, 2019Updated 7 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A project for clustering text streams using locality-sensitive hashing (LSH) in Python☆26Sep 23, 2011Updated 14 years ago
- ☆32Nov 15, 2017Updated 8 years ago
- LSH Scheme based on Longest Circular Co-Substring (SIGMOD 2020)☆14Jul 8, 2021Updated 4 years ago
- LSH index for approximate set containment search☆62Jun 27, 2022Updated 3 years ago
- Minhash and maxhash library in Python, combining flexibility, expressivity, and performance.☆22Dec 14, 2024Updated last year
- Simhash and near-duplicate detection☆423May 15, 2023Updated 3 years ago
- Simple NLP Search - Dataset Generator☆17Apr 29, 2016Updated 10 years ago
- Simple docker deployment of document layout analysis using detectron2☆19Nov 7, 2021Updated 4 years ago
- Collection of some algorithms for entity resolution☆28Sep 7, 2015Updated 10 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [hibernating] Dynamic topic models☆39Jun 22, 2015Updated 10 years ago
- This repository contains the Framester resource, the main outcome of the framester project.☆34Oct 22, 2025Updated 6 months ago
- Attention based aspect extraction via pytorch☆14Jun 8, 2020Updated 5 years ago
- Weighted MinHash implementation on CUDA (multi-gpu).☆122Nov 29, 2023Updated 2 years ago
- LTR DNN in tensorflow, an improvement of DSSM☆21Oct 4, 2017Updated 8 years ago
- ♥ Essential Functions for DNA Manipulation☆20Jun 15, 2025Updated 11 months ago
- A Python Implementation of Simhash Algorithm☆1,037Mar 24, 2022Updated 4 years ago
- pains filter using rdktit☆11Mar 17, 2015Updated 11 years ago
- Python code for implementing embeddings in the Wasserstein space of elliptical distributions☆11Jul 22, 2020Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Tweets annotated with coarse-grained sense labels (supersenses)☆13Jun 13, 2014Updated 11 years ago
- ☆13Feb 16, 2021Updated 5 years ago
- ☆14Jun 18, 2020Updated 5 years ago
- Customer simulation for direct marketing experiments☆20Jul 9, 2021Updated 4 years ago
- A zero-shot relation extractor, easily downloadable from the HuggingFace repo.☆12Aug 13, 2021Updated 4 years ago
- Text pattern search using marisa-trie☆19Jan 26, 2025Updated last year
- Python implementation of the dgim algorithm: Compact datastructure to estimate the number of "True" in the last N elements of a boolean s…☆19Dec 26, 2022Updated 3 years ago
- ☆16Jul 23, 2023Updated 2 years ago
- Accessing the Facebook Marketing API using httr in R, for demographic researchers☆21Nov 8, 2017Updated 8 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Active Learning for text classification using scikit-learn☆24Jun 6, 2019Updated 6 years ago
- 💫 Easily install and load packages for Twitter network analysis and visualisation☆20May 13, 2020Updated 6 years ago
- SetSketch: Filling the Gap between MinHash and HyperLogLog☆49Aug 11, 2021Updated 4 years ago
- PlaneDict is a class that provides convenient work with nested dictionaries☆13May 11, 2018Updated 8 years ago
- A word2vec snippet☆22Jul 20, 2017Updated 8 years ago
- Solr SearchComponent for altering and re-executing queries that product poor results☆14May 12, 2021Updated 5 years ago
- Twitter data sets for Named Entity Extraction and Disambiguation☆17Jun 26, 2014Updated 11 years ago