All-pair set similarity search on millions of sets in Python and on a laptop
☆604Oct 11, 2022Updated 3 years ago
Alternatives and similar repositories for SetSimilaritySearch
Users that are interested in SetSimilaritySearch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Efficient set similarity search algorithms implemented in Go☆35Aug 27, 2022Updated 3 years ago
- Sketch and LSH Index library for Java, including OPH methods as well as the Lazo method☆15Dec 24, 2023Updated 2 years ago
- MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW☆2,920Apr 18, 2026Updated last month
- Pampy: The Pattern Matching for Python you always dreamed of.☆3,527Jan 16, 2025Updated last year
- A natural language modeling framework based on PyTorch☆6,299Oct 17, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- 📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.☆3,529Apr 18, 2025Updated last year
- Fast word vectors with little memory usage in Python☆416Jun 26, 2021Updated 4 years ago
- Efficient Counter that uses a limited (bounded) amount of memory regardless of data size.☆932Nov 20, 2022Updated 3 years ago
- DartMinHash: Fast Sketching for Weighted Sets☆12Dec 8, 2025Updated 5 months ago
- Feature engineering and machine learning: together at last!☆26Jan 1, 2021Updated 5 years ago
- Small Image Library for Python 3☆415Dec 8, 2022Updated 3 years ago
- A very simple framework for state-of-the-art Natural Language Processing (NLP)☆14,380Oct 27, 2025Updated 6 months ago
- Funky takes shell functions to the next level by making them easier to define, more flexible, and more interactive.☆669Jul 15, 2025Updated 10 months ago
- Learning embeddings for classification, retrieval and ranking.☆3,956Dec 4, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Quantized word vectors that take 8x-16x less space than regular word vectors☆753Mar 31, 2020Updated 6 years ago
- two strange things to do with neural nets☆15Feb 18, 2019Updated 7 years ago
- A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.☆4,463Jul 29, 2025Updated 9 months ago
- Python library for building highly effective data science workflows☆947Jul 20, 2023Updated 2 years ago
- Approximate Nearest Neighbor Search for Sparse Data in Python!☆918Oct 2, 2020Updated 5 years ago
- Snips Python library to extract meaning from text☆3,968May 22, 2023Updated 3 years ago
- Python library that makes it easy for data scientists to create charts.☆3,633Oct 16, 2024Updated last year
- Python memoization across program runs.☆106Nov 19, 2018Updated 7 years ago
- ☆11Nov 17, 2017Updated 8 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Perform lexical analysis on words, one word at a time.☆64Jun 6, 2018Updated 7 years ago
- A context-preserving word cloud generator☆442Jul 6, 2023Updated 2 years ago
- 🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library.…☆6,885Mar 6, 2026Updated 2 months ago
- Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://…☆2,391Aug 26, 2021Updated 4 years ago
- Tensorflow implementation of Facebook TagSpace☆74Jan 29, 2019Updated 7 years ago
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆291Jun 11, 2023Updated 2 years ago
- A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural …☆2,933Nov 7, 2022Updated 3 years ago
- Concurrent data pipelines in Python >>>☆1,597Jul 20, 2023Updated 2 years ago
- A fast, efficient universal vector embedding utility package.☆1,659Aug 3, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk☆14,239Oct 29, 2025Updated 6 months ago
- GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.☆1,266Oct 31, 2019Updated 6 years ago
- ☆3,171Nov 16, 2021Updated 4 years ago
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…☆8,504Apr 1, 2026Updated last month
- FAst Lookups of Cosine and Other Nearest Neighbors (based on fast locality-sensitive hashing)☆1,160Jun 1, 2024Updated last year
- Python Fast Dataflow programming framework for Data pipeline work( Web Crawler,Machine Learning,Quantitative Trading.etc)☆1,196Feb 3, 2026Updated 3 months ago
- A Keras model that addresses the Quora Question Pairs dyadic prediction task.☆14Feb 18, 2017Updated 9 years ago