Datasets and tools for basic natural language processing.
☆390Sep 10, 2021Updated 4 years ago
Alternatives and similar repositories for language-resources
Users that are interested in language-resources are comparing it to the libraries listed below
Sorting:
- Text-to-Speech tutorial at SLTU 2016☆35May 10, 2016Updated 9 years ago
- ☆213Jun 16, 2018Updated 7 years ago
- A simple tutorial on setting up Sparrowhawk - a text-to-speech normalization engine☆14Oct 16, 2017Updated 8 years ago
- Covering grammars for English and Russian text normalization☆60Sep 15, 2019Updated 6 years ago
- NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment☆16Apr 13, 2022Updated 3 years ago
- 💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies☆1,390Jun 6, 2024Updated last year
- Read-only unofficial mirror of Pynini☆17May 7, 2019Updated 6 years ago
- Massively multilingual pronunciation mining☆363Mar 3, 2026Updated 2 weeks ago
- ☆17Jul 29, 2018Updated 7 years ago
- SIGMORPHON 2020 Shared Task: Grapheme-to-Phoneme, Unsupervised Induction of Morphology, and Typologically Diverse Morphological Inflectio…☆36Apr 25, 2025Updated 10 months ago
- 🎯 Speech Recognition Challenge by Speech Lab - IIT Madras☆10Nov 5, 2020Updated 5 years ago
- Parallelized automatic corpus collection for ASR. Forked from https://github.com/EgorLakomkin/KTSpeechCrawler☆23Mar 21, 2021Updated 4 years ago
- Code for ICASSP 2019 paper☆18Oct 29, 2018Updated 7 years ago
- Korean read speech corpus (about 120 hours, 17GB) from National Institute of Korean Language☆43Feb 28, 2018Updated 8 years ago
- Thai smart home corpus with "Gowajee" hotword☆18Jul 30, 2023Updated 2 years ago
- Read-only unofficial mirror of the OpenGrm Thrax Grammar Development Tools☆16May 2, 2019Updated 6 years ago
- A bunch of scripts exploiting several tools to perform inverse text normalization (ITN)☆21Sep 27, 2017Updated 8 years ago
- Crawler for linguistic corpora☆213Aug 18, 2025Updated 7 months ago
- Automatically exported from code.google.com/p/transducersaurus☆11Apr 1, 2015Updated 10 years ago
- CMU Wilderness Multilingual Speech Dataset☆291Apr 20, 2019Updated 6 years ago
- Data and code for grapheme-to-phoneme transducers in lots of languages☆149Apr 5, 2024Updated last year
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.☆27Feb 15, 2024Updated 2 years ago
- Awesome Lao Natural Language Processing☆16Mar 7, 2025Updated last year
- G2P with Tensorflow☆681Jul 29, 2024Updated last year
- ☆14Jun 12, 2015Updated 10 years ago
- CoVoST: A Large-Scale Multilingual Speech-To-Text Translation Corpus (CC0 Licensed)☆396Sep 14, 2021Updated 4 years ago
- Corpus of oral arguments (recorded speech + official transcripts) of the United States Supreme Court☆22Dec 8, 2022Updated 3 years ago
- CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages☆482Mar 6, 2020Updated 6 years ago
- Simple text to phones converter for multiple languages☆1,520Sep 26, 2024Updated last year
- Open tools and data for cloudless automatic speech recognition☆446Mar 30, 2021Updated 4 years ago
- A GPU language model, based on btree backed tries.☆29Mar 6, 2018Updated 8 years ago
- A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)☆801Mar 8, 2026Updated last week
- This is now the official location of the Merlin project.☆1,321Mar 3, 2020Updated 6 years ago
- ☆20Jul 22, 2022Updated 3 years ago
- Read-only unofficial mirror of OpenFst☆44May 15, 2022Updated 3 years ago
- An opensource text-to-speech (TTS) voice building tool☆684Jul 22, 2024Updated last year
- ☆45Oct 24, 2020Updated 5 years ago
- VCTK multi-speaker tacotron for ICASSP 2020☆266Mar 29, 2022Updated 3 years ago
- The Dakshina dataset is a collection of text in both Latin and native scripts for 12 South Asian languages. For each language, the datase…☆206May 27, 2020Updated 5 years ago