epeake / ModifiedKneserNey
Interpolated Kneser-Ney smoothing with an out-of-vocabulary correction and discount estimated from training data
☆12Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for ModifiedKneserNey
- This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.☆72Updated last year
- This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to…☆42Updated 3 years ago
- Unicode Standard tokenization routines and orthography profile segmentation☆33Updated 2 years ago
- ☆14Updated 5 years ago
- phone inventory library☆15Updated last year
- Covering grammars for English and Russian text normalization☆60Updated 5 years ago
- Unsupervised spoken sentence embeddings☆14Updated last year
- ☆19Updated 3 years ago
- Repository for sharing the data in the Tamasheq language, one of the target languages for the low-resource speech translation track at IW…☆15Updated last year
- python code for converting among IPA, ARPABET, XSAMPA, Callhome, DISC, TIMIT, plus some lexical tones.☆30Updated 9 months ago
- Source code for ASRU 2019 paper "Adapting Pretrained Transformer to Lattices for Spoken Language Understanding"☆11Updated 4 years ago
- Second SIGMORPHON Shared Task on Grapheme-to-Phoneme Conversions☆22Updated 3 years ago
- Adapt Kaldi-ASR nnet3 chain models from Zamia-Speech.org to a different language model☆34Updated 4 years ago
- RNNs for Text Normalization☆38Updated 6 years ago
- Incorporating KenLM language model with HuggingFace implementation of Wav2Vec2CTC Model using beam search decoding☆71Updated 3 years ago
- Kaldi style neural network training in pytorch for use in place of nnet3 in Kaldi.☆26Updated 4 months ago
- Phonetically-Oriented Word Error Rate☆33Updated 5 years ago
- An adaptation of Fairseq to (End-to-end) speech translation.☆22Updated 2 years ago
- Parallelized automatic corpus collection for ASR. Forked from https://github.com/EgorLakomkin/KTSpeechCrawler☆23Updated 3 years ago
- Links to data used in Sproat & Jaitly (https://arxiv.org/abs/1611.00068) experiments.☆76Updated 3 years ago
- Feature extraction for accented-speech or pathological speech☆16Updated 5 years ago
- Coqui Inference Engine☆38Updated 3 years ago
- Improving Disfluency Detection by Self-Training a Self-Attentive Model☆47Updated 3 years ago
- SIGMORPHON 2020 Shared Task: Grapheme-to-Phoneme, Unsupervised Induction of Morphology, and Typologically Diverse Morphological Inflectio…☆35Updated 3 years ago
- Python library for n-gram models in ARPA format☆40Updated last year
- ☆56Updated last year
- Python implementation of Levenshtein distance and Levenshtein automata matching☆27Updated 5 years ago
- MaSS - Multilingual corpus of Sentence-aligned Spoken utterances☆48Updated 2 months ago
- A recipe for constituency parsing, disfluency tagging and obtaining the fluent transcripts of English Fisher dataset☆12Updated 3 years ago
- Python API for reading and querying ARPA formatted language models.☆33Updated 10 years ago