alicank / Translation-Augmented-LibriSpeech-CorpusLinks
Large scale (>200h) and publicly available read audio book corpus. This corpus is an augmentation of LibriSpeech ASR Corpus (1000h) and contains English utterances (from audiobooks) automatically aligned with French text. Our dataset offers ~236h of speech aligned to translated text.
☆44Updated 3 years ago
Alternatives and similar repositories for Translation-Augmented-LibriSpeech-Corpus
Users that are interested in Translation-Augmented-LibriSpeech-Corpus are comparing it to the libraries listed below
Sorting:
- The Fisher and CALLHOME Spanish–English Speech Translation Corpus☆40Updated 3 years ago
- An adaptation of Fairseq to (End-to-end) speech translation.☆22Updated 3 years ago
- Speech2vec pre-trained word vectors☆76Updated 7 years ago
- Covering grammars for English and Russian text normalization☆60Updated 6 years ago
- RNNs for Text Normalization☆39Updated 7 years ago
- ☆15Updated 6 years ago
- Links to data used in Sproat & Jaitly (https://arxiv.org/abs/1611.00068) experiments.☆76Updated 4 years ago
- A spoken question answering dataset on SQUAD☆49Updated 5 months ago
- Deep Learning systems for training and testing disfluency detection and related tasks on speech data.☆60Updated this week
- This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to…☆45Updated 4 years ago
- A phoneme-allophone database for many languages☆52Updated 5 years ago
- SIGMORPHON 2020 Shared Task: Grapheme-to-Phoneme, Unsupervised Induction of Morphology, and Typologically Diverse Morphological Inflectio…☆36Updated 6 months ago
- Kaldi style neural network training in pytorch for use in place of nnet3 in Kaldi.☆26Updated last year
- This repository describes our reproducible framework for assessing self-supervised representation learning from speech☆51Updated 4 years ago
- A Neural Machine Translation toolkit for research purpose☆82Updated 8 months ago
- Code for DeCoAR (ICASSP 2020) and BERTphone (Odyssey 2020)☆103Updated 2 years ago
- Python API for reading and querying ARPA formatted language models.☆33Updated 11 years ago
- Training an n-gram based Language Model using KenLM toolkit for Deep Speech 2☆114Updated 6 years ago
- ☆45Updated 6 years ago
- An efficient implementation of RNN-T Prefix Beam Search in C++/CUDA.☆68Updated 4 years ago
- Repository for SLURP paper☆106Updated 3 years ago
- Grapheme to phoneme model for PyTorch☆40Updated 3 years ago
- Support tools for punctuation and boundary detection for ASR output.☆56Updated 2 years ago
- Helsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text☆245Updated 5 years ago
- Improving Disfluency Detection by Self-Training a Self-Attentive Model☆47Updated 4 years ago
- SHAS: Approaching optimal Segmentation for End-to-End Speech Translation☆40Updated 2 years ago
- A bunch of scripts exploiting several tools to perform inverse text normalization (ITN)☆21Updated 8 years ago
- Code for SLT 2016 paper on Grapheme-to-Phoneme conversion using attention based encoder-decoder models☆15Updated 6 years ago
- Small language toolkit for creation, interpolation and pruning of ARPA language models☆92Updated 3 years ago
- BERT and LSTM baseline models of the ZeroSpeech Challenge 2021☆60Updated 3 years ago