google/language-resources

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google/language-resources)

google / language-resources

Datasets and tools for basic natural language processing.

☆389

Alternatives and similar repositories for language-resources

Users that are interested in language-resources are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

google / sparrowhawk
View on GitHub
☆215Jun 16, 2018Updated 8 years ago
mjansche / tts-tutorial
View on GitHub
Text-to-Speech tutorial at SLTU 2016
☆35May 10, 2016Updated 10 years ago
danijel3 / SparrowhawkTest
View on GitHub
A simple tutorial on setting up Sparrowhawk - a text-to-speech normalization engine
☆14Oct 16, 2017Updated 8 years ago
google-research-datasets / TextNormalizationCoveringGrammars
View on GitHub
Covering grammars for English and Russian text normalization
☆61Sep 15, 2019Updated 6 years ago
miccio-dk / NISQA
View on GitHub
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
☆16Apr 13, 2022Updated 4 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
coqui-ai / open-speech-corpora
View on GitHub
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
☆1,397Jun 6, 2024Updated 2 years ago
mjansche / pynini
View on GitHub
Read-only unofficial mirror of Pynini
☆17May 7, 2019Updated 7 years ago
CUNY-CL / wikipron
View on GitHub
Massively multilingual pronunciation mining
☆370Jul 13, 2026Updated last week
Speech-Lab-IITM / Hindi-ASR-Challenge
View on GitHub
🎯 Speech Recognition Challenge by Speech Lab - IIT Madras
☆10Nov 5, 2020Updated 5 years ago
google / asr-recipes
View on GitHub
☆17Jul 29, 2018Updated 7 years ago
sigmorphon / 2020
View on GitHub
SIGMORPHON 2020 Shared Task: Grapheme-to-Phoneme, Unsupervised Induction of Morphology, and Typologically Diverse Morphological Inflectio…
☆36Apr 25, 2025Updated last year
mjansche / thrax
View on GitHub
Read-only unofficial mirror of the OpenGrm Thrax Grammar Development Tools
☆16May 2, 2019Updated 7 years ago
ekapolc / gowajee_corpus
View on GitHub
Thai smart home corpus with "Gowajee" hotword
☆19Jul 30, 2023Updated 2 years ago
Prem-kumar27 / Fast-KTSpeechCrawler
View on GitHub
Parallelized automatic corpus collection for ASR. Forked from https://github.com/EgorLakomkin/KTSpeechCrawler
☆23Mar 21, 2021Updated 5 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
markusdr / transducersaurus
View on GitHub
Automatically exported from code.google.com/p/transducersaurus
☆11Apr 1, 2015Updated 11 years ago
google / corpuscrawler
View on GitHub
Crawler for linguistic corpora
☆216Aug 18, 2025Updated 11 months ago
homink / speech.ko
View on GitHub
Korean read speech corpus (about 120 hours, 17GB) from National Institute of Korean Language
☆43Feb 28, 2018Updated 8 years ago
ljuvela / multiscale-GAN
View on GitHub
Code for ICASSP 2019 paper
☆18Oct 29, 2018Updated 7 years ago
festvox / datasets-CMU_Wilderness
View on GitHub
CMU Wilderness Multilingual Speech Dataset
☆292Apr 20, 2019Updated 7 years ago
wannaphong / Awesome-Lao-NLP
View on GitHub
Awesome Lao Natural Language Processing
☆19Mar 7, 2025Updated last year
uiuc-sst / g2ps
View on GitHub
Data and code for grapheme-to-phoneme transducers in lots of languages
☆152Apr 5, 2024Updated 2 years ago
cmusphinx / g2p-seq2seq
View on GitHub
G2P with Tensorflow
☆680Jul 29, 2024Updated last year
bootphon / phonemizer
View on GitHub
Simple text to phones converter for multiple languages
☆1,557Sep 26, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
facebookresearch / covost
View on GitHub
CoVoST: A Large-Scale Multilingual Speech-To-Text Translation Corpus (CC0 Licensed)
☆401Sep 14, 2021Updated 4 years ago
idiap / inv-tn
View on GitHub
A bunch of scripts exploiting several tools to perform inverse text normalization (ITN)
☆21Sep 27, 2017Updated 8 years ago
XapaJIaMnu / gLM
View on GitHub
A GPU language model, based on btree backed tries.
☆30Mar 6, 2018Updated 8 years ago
Kyubyong / css10
View on GitHub
CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages
☆490Mar 6, 2020Updated 6 years ago
gooofy / zamia-speech
View on GitHub
Open tools and data for cloudless automatic speech recognition
☆449Mar 30, 2021Updated 5 years ago
mjansche / openfst
View on GitHub
Read-only unofficial mirror of OpenFst
☆44May 15, 2022Updated 4 years ago
NTRLab / MediaSpeech
View on GitHub
☆22Jul 22, 2022Updated 3 years ago
athena-team / DiDiSpeech
View on GitHub
☆45Oct 24, 2020Updated 5 years ago
dmort27 / epitran
View on GitHub
A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)
☆824Jun 18, 2026Updated last month
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
google / voice-builder
View on GitHub
An opensource text-to-speech (TTS) voice building tool
☆687Jul 22, 2024Updated last year
CSTR-Edinburgh / merlin
View on GitHub
This is now the official location of the Merlin project.
☆1,320Mar 3, 2020Updated 6 years ago
wannaphong / LaoNLP
View on GitHub
Lao language Natural Language Processing toolkit
☆35Jan 9, 2026Updated 6 months ago
Open-Speech-EkStep / indic-punct
View on GitHub
☆45Dec 15, 2022Updated 3 years ago
r9y9 / icassp2020-espnet-tts-merlin-baseline
View on GitHub
ICASSP 2020 ESPnet-TTS: Merlin baseline system
☆37Oct 28, 2019Updated 6 years ago
egorsmkv / asr-corpus-creator
View on GitHub
This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.
☆27Feb 15, 2024Updated 2 years ago
facebookresearch / WavAugment
View on GitHub
A library for speech data augmentation in time-domain
☆689Aug 30, 2021Updated 4 years ago