mammothb / editdistpy
Fast edit distance Python extension written in Cython/C++. Supports Levenshtein distance and Damerau Optimal String Alignment (OSA) distance.
β23Updated 7 months ago
Alternatives and similar repositories for editdistpy:
Users that are interested in editdistpy are comparing it to the libraries listed below
- β17Updated last year
- zero-vocab or low-vocab embeddingsβ18Updated 2 years ago
- Execute arbitrary SQL queries on π€ Datasetsβ32Updated last year
- Source code for the Apple reproductionβ32Updated 3 years ago
- PyTorch-IE: State-of-the-art Information Extraction in PyTorchβ77Updated this week
- Tower Parse: Low-Resource Dependency Parsing via Hierarchical Source Selectionβ15Updated 3 years ago
- Learning BPE embeddings by first learning a segmentation model and then training word2vecβ19Updated 2 years ago
- TorchServe+Streamlit for easily serving your HuggingFace NER modelsβ33Updated 2 years ago
- Semeval-2021 Multilingual and Cross-lingual Word-in-Context Taskβ18Updated 3 years ago
- An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.β86Updated 3 years ago
- β28Updated last year
- bin filesβ13Updated 2 months ago
- Implementation of pQRNN in PyTorchβ46Updated 3 years ago
- Tokenization across languages. Useful as preprocessing for subword tokenization.β22Updated 2 years ago
- Combining encoder-based language modelsβ11Updated 3 years ago
- Pyinfer is a model agnostic tool for ML developers and researchers to benchmark the inference statistics for machine learning models or fβ¦β24Updated 4 years ago
- A simple neural truecaser written in pytorch and allennlp.β33Updated 9 months ago
- Repository for Findings of EMNLP 2020 "Context-aware Stand-alone Neural Spelling Correction"β18Updated 4 years ago
- Crawling engine that crawls a set of top-level domains looking for documents in a list of languagesβ10Updated last year
- Unofficial implementation of Adaptive Input in PyTorchβ12Updated 6 years ago
- Seed Machine Translation Dataβ31Updated 4 months ago
- WebRED is a large and diverse manually annotated dataset for extracting relationships from a variety of text found on the World Wide Web.β22Updated 4 years ago
- Correction of spaces with character-based neural language models.β13Updated 2 years ago
- A lightweight but powerful library to build token indices for NLP tasks, compatible with major Deep Learning frameworks like PyTorch and β¦β51Updated 4 months ago
- An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For instβ¦β22Updated 3 years ago
- A tiny BERT for low-resource monolingual modelsβ31Updated 6 months ago
- An open-source NLP library: fast text cleaning and preprocessingβ23Updated 3 years ago
- Post-processing OCR errors with seq2seq modelsβ28Updated 4 years ago
- Documentation effort for the BookCorpus datasetβ34Updated 3 years ago
- Multilingual Open Textβ25Updated 5 months ago