SuperMinHash: A New Minwise Hashing Algorithm for Jaccard Similarity Estimation, Simhash and SimhashIndex
☆19Nov 18, 2022Updated 3 years ago
Alternatives and similar repositories for superminhash
Users that are interested in superminhash are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- SuperMinHash: A New Minwise Hashing Algorithm for Jaccard Similarity Estimation☆25Jan 1, 2018Updated 8 years ago
- Open Source Implementation of Simhash in Python☆24Sep 14, 2017Updated 8 years ago
- Rust implementation of probminhash, superminhash and hyperloglog sketching algorithms☆31Jan 22, 2026Updated 2 months ago
- Github analytics powered by the world's fastest real-time analytics database☆15Jan 10, 2024Updated 2 years ago
- A Python Implementation of Simhash Algorithm☆1,036Mar 24, 2022Updated 4 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Time series classification on a hue histogram using kNN and Euclidean distance☆11Oct 15, 2015Updated 10 years ago
- This module contains an implementation of the Nilsimsa locality-sensitive hashing algorithm in Java.☆18May 31, 2019Updated 6 years ago
- Common Voice Generator using Speech Synthesizer☆13Jul 28, 2021Updated 4 years ago
- Repository for lecture "Data-Driven Demand Learning and Dynamic Pricing Strategies in Competitive Markets"☆12May 8, 2018Updated 7 years ago
- Open Thai Wikipedia QA Dataset made by iApp Technology☆14Feb 17, 2021Updated 5 years ago
- Visual Hash for matching copies of visually similar images.☆16Mar 17, 2025Updated last year
- Naive Bayes classifier for detection of langage and spelling correction☆10Mar 2, 2020Updated 6 years ago
- v4l2 implementation for erlang. Simple and working.☆11Jun 2, 2020Updated 5 years ago
- A Benchmark Data Set for Community Question-Answering Research☆41Jul 24, 2017Updated 8 years ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- Store multi-tenant metrics in ClickHouse☆14Aug 26, 2022Updated 3 years ago
- SymSpell Compound implementation in Python☆11Feb 6, 2018Updated 8 years ago
- Writer Identification of Handwritten Documents☆13Oct 18, 2017Updated 8 years ago
- MiniLM (BERT) embeddings from scratch☆19Aug 14, 2025Updated 7 months ago
- ☆16May 27, 2020Updated 5 years ago
- Parallel Universal Dependencies.☆15Nov 12, 2025Updated 4 months ago
- ☆10Jun 22, 2020Updated 5 years ago
- Archive for my CNN model for diabetic kaggle competition☆12Aug 1, 2015Updated 10 years ago
- MIDict (Multi-Index Dict) can be indexed by any "keys" or "values", suitable as a bidirectional/inverse dict or a multi-key/multi-value d…☆14May 19, 2016Updated 9 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Java implementation of immutable key-value storage based on sorted string table☆12Jun 26, 2015Updated 10 years ago
- A cloud native data mesh implementation☆12Jan 15, 2021Updated 5 years ago
- RAG-Fusion implementation using Langchain, Weaviate and OpenAI☆13Oct 31, 2023Updated 2 years ago
- Thai smart home corpus with "Gowajee" hotword☆18Jul 30, 2023Updated 2 years ago
- ☆24Apr 29, 2025Updated 11 months ago
- an auto-sleeping and -waking framework around llama.cpp☆12Feb 8, 2025Updated last year
- String Distance using cython☆13Jan 19, 2020Updated 6 years ago
- An efficient simhash implementation for python☆128Oct 25, 2019Updated 6 years ago
- Python API for Various DB-Backed Simhash Clusters☆64Mar 16, 2017Updated 9 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- BM25F demo with lucene using BlendedTermQuery and a custom similarity☆15Oct 11, 2016Updated 9 years ago
- Poincare Embeddings for Word Vector Representations☆18Oct 29, 2017Updated 8 years ago
- iCQA - Intelligent Community Question Answering Framework☆31Aug 18, 2016Updated 9 years ago
- A pure Python implementation of Aho-Corasick algorithm.☆23Jul 10, 2018Updated 7 years ago
- Part of our solution to PLAsTiCC Kaggle challenge☆18Dec 27, 2018Updated 7 years ago
- Sample solution to build a deployment pipeline for Amazon SageMaker.☆13Jul 18, 2022Updated 3 years ago
- CubeQA—Question Answering on Statistical Linked Data☆21Sep 17, 2025Updated 6 months ago