mammothb / editdistpyLinks
Fast edit distance Python extension written in Cython/C++. Supports Levenshtein distance and Damerau Optimal String Alignment (OSA) distance.
β24Updated last month
Alternatives and similar repositories for editdistpy
Users that are interested in editdistpy are comparing it to the libraries listed below
Sorting:
- Execute arbitrary SQL queries on π€ Datasetsβ32Updated last year
- Tokenization across languages. Useful as preprocessing for subword tokenization.β22Updated 2 years ago
- Source code for the Apple reproductionβ32Updated 4 years ago
- Rust python bindings for symspellβ19Updated last year
- β43Updated 2 years ago
- Library for fast text representation and classification.β30Updated last year
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacingβ73Updated 2 weeks ago
- A simple neural truecaser written in pytorch and allennlp.β33Updated last year
- β28Updated 2 years ago
- Python package for deduplication/entity resolution using active learningβ81Updated 10 months ago
- Source code and data for Like a Good Nearest Neighborβ29Updated 6 months ago
- Pyinfer is a model agnostic tool for ML developers and researchers to benchmark the inference statistics for machine learning models or fβ¦β24Updated 4 years ago
- β34Updated 4 years ago
- A lightweight but powerful library to build token indices for NLP tasks, compatible with major Deep Learning frameworks like PyTorch and β¦β51Updated 7 months ago
- β17Updated 2 years ago
- Streamlit demo app to demonstrate the features of transformers interpret with multiple models.β25Updated 4 years ago
- Combining encoder-based language modelsβ11Updated 3 years ago
- Find strings/words in text; convenience and C speedβ126Updated 2 years ago
- Hashformers is a framework for hashtag segmentation with Transformers and Large Language Models (LLMs).β71Updated 10 months ago
- CodeSwitch is a NLP tool, can use for language identification, pos tagging, name entity recognition, sentiment analysis of code mixed datβ¦β35Updated 4 years ago
- β30Updated 3 years ago
- ISO 639 language codesβ45Updated 4 months ago
- Dataiku DSS plugin to detect languages, correct misspellings, and clean text data π§Όβ22Updated 5 months ago
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languagesβ13Updated 2 years ago
- Using queues, tqdm-multiprocess supports multiple worker processes, each with multiple tqdm progress bars, displaying them cleanly througβ¦β43Updated 4 years ago
- A Streamlit component for annotating text by text selecting.β40Updated last year
- β87Updated 3 years ago
- Learning BPE embeddings by first learning a segmentation model and then training word2vecβ19Updated 2 years ago
- RaKUn 2.0 - A fast keyword detection algorithmβ67Updated 2 months ago
- β22Updated last year