mammothb / editdistpy
Fast edit distance Python extension written in Cython/C++. Supports Levenshtein distance and Damerau Optimal String Alignment (OSA) distance.
β23Updated last week
Related projects: β
- An extension package of π€ Datasets that provides support for executing arbitrary SQL queries on HF datasetsβ31Updated 7 months ago
- Source code for the Apple reproductionβ30Updated 3 years ago
- Self-contained Python package for OpenFstβ50Updated last year
- A simple neural truecaser written in pytorch and allennlp.β31Updated 3 months ago
- MaSS - Multilingual corpus of Sentence-aligned Spoken utterancesβ48Updated this week
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languagesβ13Updated last year
- Combining encoder-based language modelsβ11Updated 2 years ago
- Repository for Findings of EMNLP 2020 "Context-aware Stand-alone Neural Spelling Correction"β18Updated 3 years ago
- Tokenization across languages. Useful as preprocessing for subword tokenization.β21Updated last year
- zero-vocab or low-vocab embeddingsβ16Updated 2 years ago
- Implementation of pQRNN in PyTorchβ46Updated 2 years ago
- Deep neural approach to Boundary and Disfluency Detection - Based on my Master's workβ19Updated last month
- Repository with illustrations for cft-contest-2018β12Updated 5 years ago
- Source code for ASRU 2019 paper "Adapting Pretrained Transformer to Lattices for Spoken Language Understanding"β11Updated 4 years ago
- This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text toβ¦β41Updated 3 years ago
- β16Updated this week
- ASR project with pytorch-lightningβ20Updated 4 years ago
- A lightweight but powerful library to build token indices for NLP tasks, compatible with major Deep Learning frameworks like PyTorch and β¦β49Updated 3 years ago
- Parallelized automatic corpus collection for ASR. Forked from https://github.com/EgorLakomkin/KTSpeechCrawlerβ24Updated 3 years ago
- Dataset Release for Intent Classification from Speechβ43Updated last year
- The collection of bulding blocks building fine-tunable metric learning modelsβ31Updated 2 months ago
- docker for HF wav2vec2-sprintβ12Updated 3 years ago
- β86Updated 2 years ago
- Python library for n-gram models in ARPA formatβ38Updated last year
- β17Updated last year
- Post-processing OCR errors with seq2seq modelsβ28Updated 4 years ago
- super fast cpp implementation of longest common subsequence/substringβ65Updated 10 months ago
- β33Updated 3 years ago
- TorchServe+Streamlit for easily serving your HuggingFace NER modelsβ31Updated 2 years ago