MaartenGr/PolyFuzz

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MaartenGr/PolyFuzz)

MaartenGr / PolyFuzz

Fuzzy string matching, grouping, and evaluation.

☆800

Alternatives and similar repositories for PolyFuzz

Users that are interested in PolyFuzz are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MaartenGr / KeyBERT
View on GitHub
Minimal keyword extraction with BERT
☆4,203May 13, 2026Updated 2 months ago
MaartenGr / BERTopic
View on GitHub
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
☆7,748May 13, 2026Updated 2 months ago
ddangelov / Top2Vec
View on GitHub
Top2Vec learns jointly embedded topic, document and word vectors.
☆3,104Nov 14, 2024Updated last year
rapidfuzz / RapidFuzz
View on GitHub
Rapid fuzzy string matching in Python using various string metrics
☆4,024Updated this week
NorskRegnesentral / skweak
View on GitHub
skweak: A software toolkit for weak supervision applied to NLP tasks
☆925Sep 2, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
erre-quadro / spikex
View on GitHub
SpikeX - SpaCy Pipes for Knowledge Extraction
☆403Jul 30, 2021Updated 4 years ago
MilaNLProc / contextualized-topic-models
View on GitHub
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coher…
☆1,272Jul 24, 2025Updated 11 months ago
kevinlu1248 / pyate
View on GitHub
PYthon Automated Term Extraction
☆318Feb 8, 2023Updated 3 years ago
gandersen101 / spaczz
View on GitHub
Fuzzy matching and more functionality for spaCy.
☆258Jul 6, 2024Updated 2 years ago
webis-de / small-text
View on GitHub
Active Learning for Text Classification in Python
☆646May 24, 2026Updated last month
RelevanceAI / vectorhub
View on GitHub
Vector Hub - Library for easy discovery, and consumption of State-of-the-art models to turn data into vectors. (text2vec, image2vec, vide…
☆560Aug 20, 2024Updated last year
davidberenstein1957 / concise-concepts
View on GitHub
This repository contains an easy and intuitive approach to few-shot NER using most similar expansion over spaCy embeddings. Now with enti…
☆244Jun 19, 2023Updated 3 years ago
neomatrix369 / nlp_profiler
View on GitHub
A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data…
☆244May 12, 2024Updated 2 years ago
oborchers / Fast_Sentence_Embeddings
View on GitHub
Compute Sentence Embeddings Fast!
☆625Mar 2, 2023Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Bergvca / string_grouper
View on GitHub
Super Fast String Matching in Python
☆371Updated this week
TimSchopf / KeyphraseVectorizers
View on GitHub
Set of vectorizers that extract keyphrases with part-of-speech patterns from a collection of text documents and convert them into a docum…
☆268Nov 8, 2024Updated last year
huggingface / setfit
View on GitHub
Efficient few-shot learning with Sentence Transformers
☆2,772May 26, 2026Updated last month
jalammar / ecco
View on GitHub
Explain, analyze, and visualize NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the…
☆2,100Aug 15, 2024Updated last year
jboynyc / textnets
View on GitHub
Text analysis with networks.
☆294May 14, 2026Updated 2 months ago
koaning / embetter
View on GitHub
just a bunch of useful embeddings for scikit-learn pipelines
☆527Feb 12, 2026Updated 5 months ago
jbesomi / texthero
View on GitHub
Text preprocessing, representation and visualization from zero to hero.
☆2,910Aug 29, 2023Updated 2 years ago
KennethEnevoldsen / augmenty
View on GitHub
Augmenty is an augmentation library based on spaCy for augmenting texts.
☆156May 24, 2024Updated 2 years ago
argilla-io / argilla
View on GitHub
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
☆5,039Jul 13, 2026Updated last week
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
jfilter / clean-text
View on GitHub
🧹 Python package for text cleaning
☆1,026May 15, 2026Updated 2 months ago
vector-ai / vectorai
View on GitHub
Vector AI — A platform for building vector based applications. Encode, query and analyse data using vectors.
☆321Mar 1, 2024Updated 2 years ago
DerwenAI / pytextrank
View on GitHub
Python implementation of TextRank algorithms ("textgraphs") for phrase extraction
☆2,218Jun 24, 2026Updated 3 weeks ago
koaning / whatlies
View on GitHub
Toolkit to help understand "what lies" in word embeddings. Also benchmarking!
☆481Feb 6, 2023Updated 3 years ago
jenojp / negspacy
View on GitHub
spaCy pipeline object for negating concepts in text
☆280Apr 20, 2026Updated 3 months ago
HLasse / TextDescriptives
View on GitHub
A Python library for calculating a large variety of metrics from text
☆366May 5, 2026Updated 2 months ago
nipunsadvilkar / pySBD
View on GitHub
🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.
☆925Aug 20, 2024Updated last year
koaning / bulk
View on GitHub
A Simple Bulk Labelling Tool
☆598Jul 29, 2025Updated 11 months ago
JohnSnowLabs / nlu
View on GitHub
1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
☆968Jan 28, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
makcedward / nlpaug
View on GitHub
Data augmentation for NLP
☆4,662Updated this week
JasonKessler / scattertext
View on GitHub
Beautiful visualizations of how language differs among document types.
☆2,336Jul 4, 2026Updated 2 weeks ago
tomaarsen / SpanMarkerNER
View on GitHub
SpanMarker for Named Entity Recognition
☆477Apr 10, 2026Updated 3 months ago
huggingface / sentence-transformers
View on GitHub
State-of-the-Art Embeddings, Retrieval, and Reranking
☆18,923Updated this week
life4 / textdistance
View on GitHub
📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.
☆3,538Apr 18, 2025Updated last year
MIND-Lab / OCTIS
View on GitHub
OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
☆803Jun 21, 2026Updated 3 weeks ago
PAIR-code / lit
View on GitHub
The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic …
☆3,657Jul 7, 2026Updated last week