All-pair set similarity search on millions of sets in Python and on a laptop
☆603Oct 11, 2022Updated 3 years ago
Alternatives and similar repositories for SetSimilaritySearch
Users that are interested in SetSimilaritySearch are comparing it to the libraries listed below
Sorting:
- Efficient set similarity search algorithms implemented in Go☆35Aug 27, 2022Updated 3 years ago
- MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW☆2,874Jan 20, 2026Updated last month
- Fast word vectors with little memory usage in Python☆416Jun 26, 2021Updated 4 years ago
- Pampy: The Pattern Matching for Python you always dreamed of.☆3,532Jan 16, 2025Updated last year
- 📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.☆3,517Apr 18, 2025Updated 10 months ago
- A natural language modeling framework based on PyTorch☆6,305Oct 17, 2022Updated 3 years ago
- Efficient Counter that uses a limited (bounded) amount of memory regardless of data size.☆934Nov 20, 2022Updated 3 years ago
- Feature engineering and machine learning: together at last!☆25Jan 1, 2021Updated 5 years ago
- Funky takes shell functions to the next level by making them easier to define, more flexible, and more interactive.☆669Jul 15, 2025Updated 7 months ago
- Small Image Library for Python 3☆418Dec 8, 2022Updated 3 years ago
- Snips Python library to extract meaning from text☆3,961May 22, 2023Updated 2 years ago
- A very simple framework for state-of-the-art Natural Language Processing (NLP)☆14,354Oct 27, 2025Updated 4 months ago
- Python library that makes it easy for data scientists to create charts.☆3,622Oct 16, 2024Updated last year
- Learning embeddings for classification, retrieval and ranking.☆3,959Dec 4, 2022Updated 3 years ago
- Python library for building highly effective data science workflows☆947Jul 20, 2023Updated 2 years ago
- Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://…☆2,391Aug 26, 2021Updated 4 years ago
- Aho-Corasick string replacement utility☆26Nov 25, 2019Updated 6 years ago
- A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.☆4,440Jul 29, 2025Updated 7 months ago
- Concurrent data pipelines in Python >>>☆1,595Jul 20, 2023Updated 2 years ago
- Quantized word vectors that take 8x-16x less space than regular word vectors☆752Mar 31, 2020Updated 5 years ago
- GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.☆1,267Oct 31, 2019Updated 6 years ago
- Semi-supervised emotion lexicon expansion with label propagation and specialized word embeddings☆21Aug 16, 2017Updated 8 years ago
- Perform lexical analysis on words, one word at a time.☆64Jun 6, 2018Updated 7 years ago
- Python Fast Dataflow programming framework for Data pipeline work( Web Crawler,Machine Learning,Quantitative Trading.etc)☆1,199Feb 3, 2026Updated last month
- A context-preserving word cloud generator☆442Jul 6, 2023Updated 2 years ago
- A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural …☆2,933Nov 7, 2022Updated 3 years ago
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…☆8,483Updated this week
- Python memoization across program runs.☆107Nov 19, 2018Updated 7 years ago
- An open source python library for automated feature engineering☆7,614Feb 3, 2026Updated last month
- Word semantics Deep Learning with Vanilla Python, Keras, Theano, TensorFlow, PyTorch☆14Apr 25, 2017Updated 8 years ago
- 🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library.…☆6,854Jan 28, 2026Updated last month
- Camelot: PDF Table Extraction for Humans☆3,717Jan 5, 2023Updated 3 years ago
- Just another HN PWA but with a "Read It Later" feature. https://brapifra.github.io/readhnlater-pwa/☆30Dec 4, 2018Updated 7 years ago
- Organized Resources for Deep Learning Researchers and Developers☆3,191Dec 22, 2022Updated 3 years ago
- A fast, efficient universal vector embedding utility package.☆1,655Aug 3, 2023Updated 2 years ago
- Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk☆14,169Oct 29, 2025Updated 4 months ago
- Python framework that facilitates the quick development of complex video analysis applications and other series-processing based applicat…☆1,026Jan 20, 2022Updated 4 years ago
- Visual analysis and diagnostic tools to facilitate machine learning model selection.☆4,395Feb 19, 2025Updated last year
- ☆3,172Nov 16, 2021Updated 4 years ago