All-pair set similarity search on millions of sets in Python and on a laptop
☆604Oct 11, 2022Updated 3 years ago
Alternatives and similar repositories for SetSimilaritySearch
Users that are interested in SetSimilaritySearch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Efficient set similarity search algorithms implemented in Go☆35Aug 27, 2022Updated 3 years ago
- Sketch and LSH Index library for Java, including OPH methods as well as the Lazo method☆15Dec 24, 2023Updated 2 years ago
- MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW☆2,890Jan 20, 2026Updated 2 months ago
- Pampy: The Pattern Matching for Python you always dreamed of.☆3,531Jan 16, 2025Updated last year
- A natural language modeling framework based on PyTorch☆6,306Oct 17, 2022Updated 3 years ago
- 📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.☆3,525Apr 18, 2025Updated 11 months ago
- Fast word vectors with little memory usage in Python☆416Jun 26, 2021Updated 4 years ago
- Efficient Counter that uses a limited (bounded) amount of memory regardless of data size.☆934Nov 20, 2022Updated 3 years ago
- DartMinHash: Fast Sketching for Weighted Sets☆12Dec 8, 2025Updated 3 months ago
- Feature engineering and machine learning: together at last!☆25Jan 1, 2021Updated 5 years ago
- Small Image Library for Python 3☆418Dec 8, 2022Updated 3 years ago
- A very simple framework for state-of-the-art Natural Language Processing (NLP)☆14,352Oct 27, 2025Updated 4 months ago
- Funky takes shell functions to the next level by making them easier to define, more flexible, and more interactive.☆669Jul 15, 2025Updated 8 months ago
- Learning embeddings for classification, retrieval and ranking.☆3,957Dec 4, 2022Updated 3 years ago
- Quantized word vectors that take 8x-16x less space than regular word vectors☆752Mar 31, 2020Updated 5 years ago
- two strange things to do with neural nets☆15Feb 18, 2019Updated 7 years ago
- A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.☆4,448Jul 29, 2025Updated 7 months ago
- Python library for building highly effective data science workflows☆947Jul 20, 2023Updated 2 years ago
- Approximate Nearest Neighbor Search for Sparse Data in Python!☆919Oct 2, 2020Updated 5 years ago
- Snips Python library to extract meaning from text☆3,960May 22, 2023Updated 2 years ago
- Python library that makes it easy for data scientists to create charts.☆3,625Oct 16, 2024Updated last year
- Python memoization across program runs.☆106Nov 19, 2018Updated 7 years ago
- ☆12Nov 17, 2017Updated 8 years ago
- Perform lexical analysis on words, one word at a time.☆64Jun 6, 2018Updated 7 years ago
- A context-preserving word cloud generator☆442Jul 6, 2023Updated 2 years ago
- 🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library.…☆6,859Mar 6, 2026Updated 2 weeks ago
- Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://…☆2,391Aug 26, 2021Updated 4 years ago
- Tensorflow implementation of Facebook TagSpace☆74Jan 29, 2019Updated 7 years ago
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆293Jun 11, 2023Updated 2 years ago
- A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural …☆2,934Nov 7, 2022Updated 3 years ago
- Concurrent data pipelines in Python >>>☆1,596Jul 20, 2023Updated 2 years ago
- A fast, efficient universal vector embedding utility package.☆1,655Aug 3, 2023Updated 2 years ago
- Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk☆14,185Oct 29, 2025Updated 4 months ago
- GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.☆1,266Oct 31, 2019Updated 6 years ago
- ☆3,172Nov 16, 2021Updated 4 years ago
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…☆8,498Mar 1, 2026Updated 3 weeks ago
- FAst Lookups of Cosine and Other Nearest Neighbors (based on fast locality-sensitive hashing)☆1,161Jun 1, 2024Updated last year
- Python Fast Dataflow programming framework for Data pipeline work( Web Crawler,Machine Learning,Quantitative Trading.etc)☆1,197Feb 3, 2026Updated last month
- A Keras model that addresses the Quora Question Pairs dyadic prediction task.☆14Feb 18, 2017Updated 9 years ago