Bergvca/string_grouper

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Bergvca/string_grouper)

Bergvca / string_grouper

Super Fast String Matching in Python

☆372

Alternatives and similar repositories for string_grouper

Users that are interested in string_grouper are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ing-bank / sparse_dot_topn
View on GitHub
Python package to accelerate the sparse matrix multiplication and top-n similarity selection
☆424Updated this week
MaartenGr / PolyFuzz
View on GitHub
Fuzzy string matching, grouping, and evaluation.
☆801Jul 10, 2025Updated last year
dedupeio / doublemetaphone
View on GitHub
Python wrapper for a C++ Double Metaphone
☆15Jan 12, 2026Updated 6 months ago
src-d / snippet-ranger
View on GitHub
☆11Nov 17, 2017Updated 8 years ago
microsoft / spacy-ann-linker
View on GitHub
spaCy pipeline component for generating spaCy KnowledgeBase Alias Candidates for Entity Linking
☆86Oct 6, 2022Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
J535D165 / recordlinkage
View on GitHub
A powerful and modular toolkit for record linkage and duplicate detection in Python
☆1,056Feb 21, 2024Updated 2 years ago
pmbaumgartner / setfit
View on GitHub
☆42Apr 20, 2023Updated 3 years ago
RobinL / fuzzymatcher
View on GitHub
Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4
☆286Aug 9, 2022Updated 3 years ago
rapidfuzz / RapidFuzz
View on GitHub
Rapid fuzzy string matching in Python using various string metrics
☆4,038Updated this week
psolin / cleanco
View on GitHub
Company Name Processor written in Python
☆359Jun 23, 2026Updated last month
NorskRegnesentral / skweak
View on GitHub
skweak: A software toolkit for weak supervision applied to NLP tasks
☆924Sep 2, 2024Updated last year
dedupeio / affinegap
View on GitHub
A Cython implementation of the affine gap string distance
☆57Jan 23, 2023Updated 3 years ago
koaning / spacy-report
View on GitHub
Generate reports for spaCy models.
☆29May 27, 2022Updated 4 years ago
MartinoMensio / spacy-sentence-bert
View on GitHub
Sentence transformers models for SpaCy
☆108Mar 9, 2023Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
J535D165 / recordlinkage-annotator
View on GitHub
A browser user interface for manual labeling of record pairs.
☆48Jun 23, 2023Updated 3 years ago
danielm-github / patentsmatch_bingsearchapproach
View on GitHub
Match Patent Assignees with Compustat and SDC via Bing Search
☆55Sep 29, 2020Updated 5 years ago
dedupeio / dedupe
View on GitHub
A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
☆4,487Jul 29, 2025Updated last year
Living-with-machines / DeezyMatch
View on GitHub
A Flexible Deep Learning Approach to Fuzzy String Matching
☆152Oct 16, 2024Updated last year
ocastel / exact-extract
View on GitHub
☆12Sep 2, 2021Updated 4 years ago
koaning / scikit-lego
View on GitHub
Extra blocks for scikit-learn pipelines.
☆1,409Updated this week
src-d / ml-core
View on GitHub
source{d} MLonCode foundation - core algorithms and models.
☆13Oct 17, 2019Updated 6 years ago
moj-analytical-services / splink
View on GitHub
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
☆2,298Updated this week
davidberenstein1957 / fast-sentence-transformers
View on GitHub
Simply, faster, sentence-transformers
☆144Aug 27, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
orijtech / media-search
View on GitHub
Media search's code
☆14Sep 15, 2018Updated 7 years ago
life4 / textdistance
View on GitHub
📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.
☆3,535Apr 18, 2025Updated last year
aysark / CV_estimating_bodyfat
View on GitHub
Estimating Body Fat Using Computer Vision (openCV2, Python)
☆23Dec 18, 2014Updated 11 years ago
anhaidgroup / deepmatcher
View on GitHub
Python package for performing Entity and Text Matching using Deep Learning.
☆622Jun 18, 2024Updated 2 years ago
TillerBurr / dash-query-builder
View on GitHub
Dash Component created from ukrbublik/react-awesome-query-builder
☆13Updated this week
jamesturk / jellyfish
View on GitHub
🪼 a python library for doing approximate and phonetic matching of strings.
☆2,227Updated this week
koaning / bulk
View on GitHub
A Simple Bulk Labelling Tool
☆599Jul 29, 2025Updated last year
jbesomi / texthero
View on GitHub
Text preprocessing, representation and visualization from zero to hero.
☆2,911Aug 29, 2023Updated 2 years ago
dshulyak / art
View on GitHub
Concurrent (with OLC) Adaptive Radix Trie in Golang.
☆12Jul 31, 2020Updated 5 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
HLasse / TextDescriptives
View on GitHub
A Python library for calculating a large variety of metrics from text
☆366May 5, 2026Updated 2 months ago
argilla-io / spacy-wordnet
View on GitHub
spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface
☆261Aug 21, 2025Updated 11 months ago
koaning / embetter
View on GitHub
just a bunch of useful embeddings for scikit-learn pipelines
☆527Feb 12, 2026Updated 5 months ago
willcrichton / wordtree
View on GitHub
A Python library for generating word tree diagrams
☆28Jul 10, 2020Updated 6 years ago
ArjitJ / DIAL
View on GitHub
Implementation of the paper "Deep Indexed Active Learning for Matching Heterogeneous Entity Representations"
☆17Dec 20, 2021Updated 4 years ago
Lyonk71 / pandas-dedupe
View on GitHub
Simplifies use of the Dedupe library via Pandas
☆137Mar 30, 2023Updated 3 years ago
DerwenAI / pytextrank
View on GitHub
Python implementation of TextRank algorithms ("textgraphs") for phrase extraction
☆2,219Jun 24, 2026Updated last month