ing-bank / sparse_dot_topn
Python package to accelerate the sparse matrix multiplication and top-n similarity selection
☆405Updated last week
Alternatives and similar repositories for sparse_dot_topn:
Users that are interested in sparse_dot_topn are comparing it to the libraries listed below
- Super Fast String Matching in Python☆367Updated last month
- Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4☆283Updated 2 years ago
- Notebooks configured to be run with Binder, usually found on my blog.☆42Updated 2 years ago
- A powerful and modular toolkit for record linkage and duplicate detection in Python☆1,001Updated last year
- Data Analysis Baseline Library☆727Updated 4 months ago
- ☆189Updated 11 months ago
- Fuzzy matching and more functionality for spaCy.☆256Updated 10 months ago
- A comprehensive and scalable set of string tokenizers and similarity measures in Python☆138Updated 9 months ago
- A tool for compiling trained SKLearn models into other representations (such as SQL, Sympy or Excel formulas)☆173Updated 2 years ago
- Fuzzy string matching, grouping, and evaluation.☆761Updated 2 months ago
- Simplifies use of the Dedupe library via Pandas☆136Updated 2 years ago
- Group thousands of similar spreadsheet or database text entries in seconds☆155Updated last year
- Doubt your data, find bad labels.☆511Updated 9 months ago
- Fixes contractions such as `you're` to `you are`☆317Updated 2 years ago
- skweak: A software toolkit for weak supervision applied to NLP tasks☆922Updated 8 months ago
- Monitor the stability of a Pandas or Spark dataframe ⚙︎☆500Updated 3 months ago
- Python package for performing Entity and Text Matching using Deep Learning.☆587Updated 10 months ago
- Test-Driven Data Analysis Functions☆299Updated 3 weeks ago
- A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques.☆468Updated last year
- Easy pipelines for pandas DataFrames.☆719Updated 6 months ago
- Imputation of missing values in tables.☆487Updated 10 months ago
- Examples for using the dedupe library☆411Updated 8 months ago
- Python package for Gower distance☆78Updated 11 months ago
- Sensible multi-core apply function for Pandas☆81Updated this week
- Natural language processing support for Pandas dataframes.☆216Updated 2 months ago
- Confidence intervals for scikit-learn forest algorithms☆287Updated 2 weeks ago
- Textpipe: clean and extract metadata from text☆301Updated 3 years ago
- 📛 Fuzzy Name Matching with Machine Learning☆264Updated 10 months ago
- Time should be taken seer-iously☆315Updated 2 years ago
- just a bunch of useful embeddings for scikit-learn pipelines☆497Updated last month