Example Python code for comparing documents using MinHash
☆251Feb 11, 2019Updated 7 years ago
Alternatives and similar repositories for MinHash
Users that are interested in MinHash are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW☆2,892Updated this week
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆293Jun 11, 2023Updated 2 years ago
- LSH based high dimensional clustering for sets and points☆80Nov 15, 2014Updated 11 years ago
- Estimating how similar are two sets using MinHash (Jaccard similarity coefficient)☆30Feb 4, 2013Updated 13 years ago
- A pure python implementation of locality sensitive hashing for text documents☆87Oct 24, 2015Updated 10 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- ☆13Feb 11, 2019Updated 7 years ago
- k-shingling for text to help compare similarity☆18Nov 11, 2019Updated 6 years ago
- A project for clustering text streams using locality-sensitive hashing (LSH) in Python☆26Sep 23, 2011Updated 14 years ago
- ☆32Nov 15, 2017Updated 8 years ago
- Keyphrase Extraction Prototypes☆15Nov 24, 2016Updated 9 years ago
- Fast approximation of similarity for sets of very different sizes☆20Mar 8, 2022Updated 4 years ago
- ... just because nltk is too heavy☆35Jul 21, 2010Updated 15 years ago
- LSH Scheme based on Longest Circular Co-Substring (SIGMOD 2020)☆14Jul 8, 2021Updated 4 years ago
- LSH index for approximate set containment search☆61Jun 27, 2022Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Simhash and near-duplicate detection☆424May 15, 2023Updated 2 years ago
- Simple docker deployment of document layout analysis using detectron2☆19Nov 7, 2021Updated 4 years ago
- Collection of some algorithms for entity resolution☆28Sep 7, 2015Updated 10 years ago
- Query-Aware LSH for Approximate NNS (In-Memory Version of QALSH)☆16Jul 9, 2021Updated 4 years ago
- Utility to translate NIF files across identifier schemes, such as DBpedia and Wikidata☆11Aug 24, 2019Updated 6 years ago
- [hibernating] Dynamic topic models☆39Jun 22, 2015Updated 10 years ago
- This repository contains the Framester resource, the main outcome of the framester project.☆33Oct 22, 2025Updated 5 months ago
- Bayesian Personalized Ranking☆214Mar 2, 2016Updated 10 years ago
- LTR DNN in tensorflow, an improvement of DSSM☆21Oct 4, 2017Updated 8 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Creates a Lucene index out of files from a local folder☆13Aug 8, 2014Updated 11 years ago
- A Python Implementation of Simhash Algorithm☆1,036Mar 24, 2022Updated 4 years ago
- ☆13Feb 16, 2021Updated 5 years ago
- Text pattern search using marisa-trie☆18Jan 26, 2025Updated last year
- ☆26Sep 6, 2018Updated 7 years ago
- An implementation of efficient LSH inspired by fruit fly brain☆88Dec 23, 2018Updated 7 years ago
- My notes on reading "Domain Driven Design" by Eric Evans☆12Jul 18, 2015Updated 10 years ago
- ☆16Jul 23, 2023Updated 2 years ago
- Accessing the Facebook Marketing API using httr in R, for demographic researchers☆21Nov 8, 2017Updated 8 years ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- Active Learning for text classification using scikit-learn☆24Jun 6, 2019Updated 6 years ago
- 💫 Easily install and load packages for Twitter network analysis and visualisation☆20May 13, 2020Updated 5 years ago
- An interactive Shiny app that allows users to transform spatial data related to a central point with a variety of non-linear distance tra…☆10Feb 20, 2025Updated last year
- Twitter data sets for Named Entity Extraction and Disambiguation☆17Jun 26, 2014Updated 11 years ago
- simhash算法实现海量内容查重☆14Apr 23, 2016Updated 9 years ago
- 6th Place Solution for the Google - Isolated Sign Language Recognition Kaggle Competition☆14May 4, 2023Updated 2 years ago
- Open Source Implementation of Simhash in Python☆24Sep 14, 2017Updated 8 years ago