Datasets and tools for basic natural language processing.
☆388Sep 10, 2021Updated 4 years ago
Alternatives and similar repositories for language-resources
Users that are interested in language-resources are comparing it to the libraries listed below
Sorting:
- Text-to-Speech tutorial at SLTU 2016☆35May 10, 2016Updated 9 years ago
- Covering grammars for English and Russian text normalization☆61Sep 15, 2019Updated 6 years ago
- ☆213Jun 16, 2018Updated 7 years ago
- A simple tutorial on setting up Sparrowhawk - a text-to-speech normalization engine☆14Oct 16, 2017Updated 8 years ago
- NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment☆16Apr 13, 2022Updated 3 years ago
- SIGMORPHON 2020 Shared Task: Grapheme-to-Phoneme, Unsupervised Induction of Morphology, and Typologically Diverse Morphological Inflectio…☆36Apr 25, 2025Updated 10 months ago
- Parallelized automatic corpus collection for ASR. Forked from https://github.com/EgorLakomkin/KTSpeechCrawler☆23Mar 21, 2021Updated 4 years ago
- 💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies☆1,386Jun 6, 2024Updated last year
- Massively multilingual pronunciation mining☆362Jan 13, 2026Updated last month
- Korean read speech corpus (about 120 hours, 17GB) from National Institute of Korean Language☆43Feb 28, 2018Updated 8 years ago
- A bunch of scripts exploiting several tools to perform inverse text normalization (ITN)☆21Sep 27, 2017Updated 8 years ago
- This is an extension of kaldi speech recognition software which allows to perform decoding of speech with hybrid word and phoneme graphs.…☆11Feb 4, 2020Updated 6 years ago
- Thai smart home corpus with "Gowajee" hotword☆18Jul 30, 2023Updated 2 years ago
- Code for ICASSP 2019 paper☆18Oct 29, 2018Updated 7 years ago
- Data and code for grapheme-to-phoneme transducers in lots of languages☆147Apr 5, 2024Updated last year
- ☆14Jun 12, 2015Updated 10 years ago
- Links to data used in Sproat & Jaitly (https://arxiv.org/abs/1611.00068) experiments.☆77Jul 9, 2021Updated 4 years ago
- CoVoST: A Large-Scale Multilingual Speech-To-Text Translation Corpus (CC0 Licensed)☆396Sep 14, 2021Updated 4 years ago
- Custom decoders for Kaldi☆80Jun 10, 2019Updated 6 years ago
- CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages☆482Mar 6, 2020Updated 5 years ago
- ☆45Oct 24, 2020Updated 5 years ago
- ICASSP 2020 ESPnet-TTS: Merlin baseline system☆36Oct 28, 2019Updated 6 years ago
- Corpus of oral arguments (recorded speech + official transcripts) of the United States Supreme Court☆22Dec 8, 2022Updated 3 years ago
- G2P with Tensorflow☆680Jul 29, 2024Updated last year
- A library for speech data augmentation in time-domain☆682Aug 30, 2021Updated 4 years ago
- Small language toolkit for creation, interpolation and pruning of ARPA language models☆92Aug 6, 2022Updated 3 years ago
- A GPU language model, based on btree backed tries.☆29Mar 6, 2018Updated 7 years ago
- A pure python module for reading and writing kaldi ark files☆267Mar 6, 2025Updated 11 months ago
- 📖 LanMIT: A Toolkit for Improving Language Models in Low-resourced Speech Recognition based on Kaldi.☆22Jul 12, 2019Updated 6 years ago
- CMU Wilderness Multilingual Speech Dataset☆291Apr 20, 2019Updated 6 years ago
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.☆27Feb 15, 2024Updated 2 years ago
- Open tools and data for cloudless automatic speech recognition☆446Mar 30, 2021Updated 4 years ago
- ☆17Jul 29, 2018Updated 7 years ago
- PyTorch implementation of Retriever: Learning Content-Style Representation☆12Jan 27, 2023Updated 3 years ago
- 🎯 Speech Recognition Challenge by Speech Lab - IIT Madras☆11Nov 5, 2020Updated 5 years ago
- ☆20Jul 22, 2022Updated 3 years ago
- The Dakshina dataset is a collection of text in both Latin and native scripts for 12 South Asian languages. For each language, the datase…☆206May 27, 2020Updated 5 years ago
- SelfRemaster: SSL Speech Restoration☆94Jan 5, 2024Updated 2 years ago
- A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)☆799Dec 24, 2025Updated 2 months ago