mammothb / editdistpy
Fast edit distance Python extension written in Cython/C++. Supports Levenshtein distance and Damerau Optimal String Alignment (OSA) distance.
☆23Updated 4 months ago
Alternatives and similar repositories for editdistpy:
Users that are interested in editdistpy are comparing it to the libraries listed below
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages☆13Updated 2 years ago
- Execute arbitrary SQL queries on 🤗 Datasets☆32Updated last year
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated last year
- Source code for the Apple reproduction☆31Updated 3 years ago
- Parallelized automatic corpus collection for ASR. Forked from https://github.com/EgorLakomkin/KTSpeechCrawler☆23Updated 3 years ago
- A simple neural truecaser written in pytorch and allennlp.☆32Updated 7 months ago
- A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models☆31Updated 3 years ago
- Combining encoder-based language models☆11Updated 3 years ago
- Unicode Standard tokenization routines and orthography profile segmentation☆34Updated 2 years ago
- Code for Detecting language from text in python using fasttext☆13Updated 4 years ago
- zero-vocab or low-vocab embeddings☆18Updated 2 years ago
- Self-contained Python package for OpenFst☆50Updated 2 years ago
- TorchServe+Streamlit for easily serving your HuggingFace NER models☆32Updated 2 years ago
- ☆28Updated last year
- Post-processing OCR errors with seq2seq models☆28Updated 4 years ago
- Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downs…☆31Updated 3 years ago
- Text utilities, including beam search decoding, tokenizing, and more, built for use in Flashlight.☆66Updated last month
- A tiny BERT for low-resource monolingual models☆31Updated 4 months ago
- A small seq2seq punctuator tool based on DistilBERT☆50Updated last month
- Tower Parse: Low-Resource Dependency Parsing via Hierarchical Source Selection☆15Updated 3 years ago
- Keras Implementation of Flair's Contextualized Embeddings☆26Updated 3 years ago
- Code for AccentDB.☆20Updated 3 years ago
- Repository for Findings of EMNLP 2020 "Context-aware Stand-alone Neural Spelling Correction"☆18Updated 4 years ago
- Dataset Release for Intent Classification from Speech☆46Updated last year
- A collection of basic python modules for spoken natural language processing☆56Updated 5 years ago
- asr2k☆49Updated 7 months ago
- Unofficial implementation of Adaptive Input in PyTorch☆12Updated 5 years ago
- ☆17Updated last year
- ☆42Updated last year
- A flexible sentence segmentation library using CRF model and regex rules☆28Updated 11 months ago