mammothb / editdistpy
Fast edit distance Python extension written in Cython/C++. Supports Levenshtein distance and Damerau Optimal String Alignment (OSA) distance.
☆23Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for editdistpy
- An extension package of 🤗 Datasets that provides support for executing arbitrary SQL queries on HF datasets☆31Updated 9 months ago
- Combining encoder-based language models☆11Updated 3 years ago
- Source code for the Apple reproduction☆31Updated 3 years ago
- ☆28Updated last year
- zero-vocab or low-vocab embeddings☆17Updated 2 years ago
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated last year
- A tiny BERT for low-resource monolingual models☆29Updated last month
- Repository for Findings of EMNLP 2020 "Context-aware Stand-alone Neural Spelling Correction"☆18Updated 3 years ago
- Tower Parse: Low-Resource Dependency Parsing via Hierarchical Source Selection☆15Updated 3 years ago
- ☆86Updated 2 years ago
- An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.☆86Updated 3 years ago
- A simple neural truecaser written in pytorch and allennlp.☆32Updated 5 months ago
- Repository with illustrations for cft-contest-2018☆12Updated 6 years ago
- Multilingual Open Text☆25Updated 3 weeks ago
- Learning BPE embeddings by first learning a segmentation model and then training word2vec☆19Updated last year
- SeqScore: Scoring for named entity recognition and other sequence labeling tasks☆21Updated last month
- A library for data streaming and augmentation☆20Updated 8 months ago
- PALI: Language identification for Perso-Arabic Scripts☆9Updated last year
- TorchServe+Streamlit for easily serving your HuggingFace NER models☆31Updated 2 years ago
- c++ mosestokenizer☆16Updated 8 months ago
- Robust Cross-lingual Embeddings from Parallel Sentences☆20Updated 4 years ago
- Tool to fix bitexts and tag near-duplicates for removal☆29Updated 3 months ago
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages☆13Updated 2 years ago
- Neural network sequence labeling model☆11Updated 4 years ago
- A lightweight but powerful library to build token indices for NLP tasks, compatible with major Deep Learning frameworks like PyTorch and …☆49Updated 4 years ago
- A web application that interfaces two GEC systems. [web instance is down]☆31Updated 3 months ago
- GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning☆27Updated 3 years ago
- A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models☆31Updated 3 years ago