Example Python code for comparing documents using MinHash
☆252Feb 11, 2019Updated 7 years ago
Alternatives and similar repositories for MinHash
Users that are interested in MinHash are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW☆2,928Apr 18, 2026Updated last month
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆290Jun 11, 2023Updated 2 years ago
- LSH based high dimensional clustering for sets and points☆80Nov 15, 2014Updated 11 years ago
- A pure python implementation of locality sensitive hashing for text documents☆87Oct 24, 2015Updated 10 years ago
- ☆13Feb 11, 2019Updated 7 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- k-shingling for text to help compare similarity☆18Nov 11, 2019Updated 6 years ago
- A project for clustering text streams using locality-sensitive hashing (LSH) in Python☆26Sep 23, 2011Updated 14 years ago
- ☆32Nov 15, 2017Updated 8 years ago
- Keyphrase Extraction Prototypes☆15Nov 24, 2016Updated 9 years ago
- LSH Scheme based on Longest Circular Co-Substring (SIGMOD 2020)☆14Jul 8, 2021Updated 4 years ago
- Simhash and near-duplicate detection☆423May 15, 2023Updated 3 years ago
- An efficient simhash implementation for python☆128Oct 25, 2019Updated 6 years ago
- Simple NLP Search - Dataset Generator☆17Apr 29, 2016Updated 10 years ago
- Collection of some algorithms for entity resolution☆28Sep 7, 2015Updated 10 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆23Apr 26, 2018Updated 8 years ago
- Query-Aware LSH for Approximate NNS (In-Memory Version of QALSH)☆16Jul 9, 2021Updated 4 years ago
- ☆12Feb 9, 2019Updated 7 years ago
- ☆21Dec 8, 2022Updated 3 years ago
- Utility to translate NIF files across identifier schemes, such as DBpedia and Wikidata☆11Aug 24, 2019Updated 6 years ago
- [hibernating] Dynamic topic models☆39Jun 22, 2015Updated 10 years ago
- This repository contains the Framester resource, the main outcome of the framester project.☆34Oct 22, 2025Updated 7 months ago
- Bayesian Personalized Ranking☆213Mar 2, 2016Updated 10 years ago
- LTR DNN in tensorflow, an improvement of DSSM☆21Oct 4, 2017Updated 8 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Find near-duplicate documents using minhashing implemented in Go.☆16Dec 22, 2015Updated 10 years ago
- A Python Implementation of Simhash Algorithm☆1,038Mar 24, 2022Updated 4 years ago
- Tweets annotated with coarse-grained sense labels (supersenses)☆13Jun 13, 2014Updated 11 years ago
- ☆14Jun 18, 2020Updated 5 years ago
- ☆26Sep 6, 2018Updated 7 years ago
- An implementation of efficient LSH inspired by fruit fly brain☆88Dec 23, 2018Updated 7 years ago
- ☆16Jul 23, 2023Updated 2 years ago
- Deduplicates property owners in Massachusetts using the MassGIS standardized assessors' parcel dataset and the OpenCorporates Bulk Data p…☆13May 18, 2026Updated 3 weeks ago
- Accessing the Facebook Marketing API using httr in R, for demographic researchers☆21Nov 8, 2017Updated 8 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Active Learning for text classification using scikit-learn☆24Jun 6, 2019Updated 7 years ago
- An interactive Shiny app that allows users to transform spatial data related to a central point with a variety of non-linear distance tra…☆10Feb 20, 2025Updated last year
- PlaneDict is a class that provides convenient work with nested dictionaries☆13May 11, 2018Updated 8 years ago
- Solr SearchComponent for altering and re-executing queries that product poor results☆14May 12, 2021Updated 5 years ago
- Twitter data sets for Named Entity Extraction and Disambiguation☆17Jun 26, 2014Updated 11 years ago
- I am currently a Senior Researcher at L3S Research Center, Leibniz University Hannover, Germany.☆24May 17, 2021Updated 5 years ago
- GSDMM: Short text clustering (Rust implementation)☆24Apr 26, 2023Updated 3 years ago