ing-bank / sparse_dot_topnView external linksLinks
Python package to accelerate the sparse matrix multiplication and top-n similarity selection
☆419Jan 12, 2026Updated last month
Alternatives and similar repositories for sparse_dot_topn
Users that are interested in sparse_dot_topn are comparing it to the libraries listed below
Sorting:
- Super Fast String Matching in Python☆371Mar 14, 2025Updated 11 months ago
- Fuzzy string matching, grouping, and evaluation.☆788Jul 10, 2025Updated 7 months ago
- Slack: #team-frontends-champions☆16Apr 24, 2025Updated 9 months ago
- Abstractions for feature engineering on large graphs of tabular data.☆24Nov 18, 2025Updated 2 months ago
- A powerful and modular toolkit for record linkage and duplicate detection in Python☆1,045Feb 21, 2024Updated last year
- Monitor the stability of a Pandas or Spark dataframe ⚙︎☆510Jan 9, 2026Updated last month
- Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends☆1,939Feb 6, 2026Updated last week
- Dataframe Integration with spaCy.☆103Mar 12, 2021Updated 4 years ago
- ☆13Dec 21, 2021Updated 4 years ago
- Python wrapper for a C++ Double Metaphone☆15Jan 12, 2026Updated last month
- Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm…☆856Jan 23, 2026Updated 3 weeks ago
- Extra blocks for scikit-learn pipelines.☆1,377Updated this week
- just a bunch of useful embeddings for scikit-learn pipelines☆521Sep 29, 2025Updated 4 months ago
- Company Name Processor written in Python☆350Jan 16, 2026Updated 3 weeks ago
- Doubt your data, find bad labels.☆516Jul 15, 2024Updated last year
- Jupyter Widget to display resources used by the kernels☆13Aug 11, 2021Updated 4 years ago
- Spark Monitoring☆13Feb 28, 2023Updated 2 years ago
- A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.☆4,436Jul 29, 2025Updated 6 months ago
- Toolkit to help understand "what lies" in word embeddings. Also benchmarking!☆474Feb 6, 2023Updated 3 years ago
- 📛 Fuzzy Name Matching with Machine Learning☆266Jun 17, 2024Updated last year
- Entity Linker solution☆1,205Sep 21, 2023Updated 2 years ago
- Bag of, not words, but tricks!☆68Oct 31, 2023Updated 2 years ago
- Tree-based indexes for neural-search☆31Mar 4, 2024Updated last year
- Approximate Nearest Neighbor Search for Sparse Data in Python!☆919Oct 2, 2020Updated 5 years ago
- 📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.☆3,515Apr 18, 2025Updated 9 months ago
- Zalo AI Challenge 2020 - Top 2 @ Voice Verification☆15Oct 4, 2022Updated 3 years ago
- MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW☆2,869Jan 20, 2026Updated 3 weeks ago
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing☆76Jan 22, 2026Updated 3 weeks ago
- Fast and Effective Biomedical Entity Linking Using a Dual Encoder☆18Apr 21, 2022Updated 3 years ago
- ☆13Mar 22, 2022Updated 3 years ago
- 🪼 a python library for doing approximate and phonetic matching of strings.☆2,189Dec 15, 2025Updated last month
- A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner☆2,643Mar 20, 2024Updated last year
- A simple and efficient tool to parallelize Pandas operations on all available CPUs☆3,809Jul 9, 2024Updated last year
- skweak: A software toolkit for weak supervision applied to NLP tasks☆926Sep 2, 2024Updated last year
- Top2Vec learns jointly embedded topic, document and word vectors.☆3,105Nov 14, 2024Updated last year
- All-pair set similarity search on millions of sets in Python and on a laptop☆604Oct 11, 2022Updated 3 years ago
- An in depth tutorial on sklearn's Pipeline and FeatureUnion classes.☆16May 5, 2017Updated 8 years ago
- REL: Radboud Entity Linker☆317Apr 9, 2024Updated last year
- Entity Matching Model solves the problem of matching company names between two possibly very large datasets.☆89Nov 25, 2025Updated 2 months ago