Example Python code for comparing documents using MinHash
☆251Feb 11, 2019Updated 7 years ago
Alternatives and similar repositories for MinHash
Users that are interested in MinHash are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW☆2,904Updated this week
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆292Jun 11, 2023Updated 2 years ago
- LSH based high dimensional clustering for sets and points☆80Nov 15, 2014Updated 11 years ago
- ☆13Feb 11, 2019Updated 7 years ago
- k-shingling for text to help compare similarity☆18Nov 11, 2019Updated 6 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A project for clustering text streams using locality-sensitive hashing (LSH) in Python☆26Sep 23, 2011Updated 14 years ago
- ☆32Nov 15, 2017Updated 8 years ago
- Keyphrase Extraction Prototypes☆15Nov 24, 2016Updated 9 years ago
- Fast approximation of similarity for sets of very different sizes☆20Mar 8, 2022Updated 4 years ago
- RNA-Skim: a rapid method for RNA-Seq quantification at transcript level☆19Sep 3, 2017Updated 8 years ago
- LSH index for approximate set containment search☆61Jun 27, 2022Updated 3 years ago
- Simhash and near-duplicate detection☆424May 15, 2023Updated 2 years ago
- An efficient simhash implementation for python☆128Oct 25, 2019Updated 6 years ago
- Simple NLP Search - Dataset Generator☆17Apr 29, 2016Updated 9 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Simple docker deployment of document layout analysis using detectron2☆19Nov 7, 2021Updated 4 years ago
- Collection of some algorithms for entity resolution☆28Sep 7, 2015Updated 10 years ago
- ☆23Apr 26, 2018Updated 7 years ago
- ☆12Feb 9, 2019Updated 7 years ago
- A high performance lock free map type for go.☆20Apr 19, 2018Updated 8 years ago
- ☆21Dec 8, 2022Updated 3 years ago
- Utility to translate NIF files across identifier schemes, such as DBpedia and Wikidata☆11Aug 24, 2019Updated 6 years ago
- [hibernating] Dynamic topic models☆39Jun 22, 2015Updated 10 years ago
- Weighted MinHash implementation on CUDA (multi-gpu).☆122Nov 29, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- This repo contains code samples for Microsoft text translation.☆19Oct 31, 2023Updated 2 years ago
- LTR DNN in tensorflow, an improvement of DSSM☆21Oct 4, 2017Updated 8 years ago
- Creates a Lucene index out of files from a local folder☆13Aug 8, 2014Updated 11 years ago
- A Python Implementation of Simhash Algorithm☆1,035Mar 24, 2022Updated 4 years ago
- Tweets annotated with coarse-grained sense labels (supersenses)☆13Jun 13, 2014Updated 11 years ago
- Python code for implementing embeddings in the Wasserstein space of elliptical distributions☆10Jul 22, 2020Updated 5 years ago
- Customer simulation for direct marketing experiments☆20Jul 9, 2021Updated 4 years ago
- A zero-shot relation extractor, easily downloadable from the HuggingFace repo.☆12Aug 13, 2021Updated 4 years ago
- An implementation of efficient LSH inspired by fruit fly brain☆88Dec 23, 2018Updated 7 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Universal Forensic Indexer and Analyzer☆10Jan 8, 2017Updated 9 years ago
- ☆16Jul 23, 2023Updated 2 years ago
- Unit and integration testing with PySpark can be tough to figure out, let's make that easier.☆23Nov 3, 2015Updated 10 years ago
- Active Learning for text classification using scikit-learn☆24Jun 6, 2019Updated 6 years ago
- LaTeX template for help with writing CSE dissertation at Ohio State University☆23Mar 20, 2020Updated 6 years ago
- PlaneDict is a class that provides convenient work with nested dictionaries☆13May 11, 2018Updated 7 years ago
- Solr SearchComponent for altering and re-executing queries that product poor results☆14May 12, 2021Updated 4 years ago