google / language-resources
Datasets and tools for basic natural language processing.
☆373Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for language-resources
- CMU Wilderness Multilingual Speech Dataset☆272Updated 5 years ago
- A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)☆655Updated 2 months ago
- This is a github repository of the abandonware Sequitur G2P by Bisani & Ney☆155Updated 4 months ago
- ☆205Updated 6 years ago
- Python package and data files for manipulating phonological segments (phones, phonemes) in terms of universal phonological features.☆221Updated 3 months ago
- A tool for automatic phoneme transcription☆157Updated last year
- Phonetisaurus G2P☆453Updated 5 months ago
- Massively multilingual pronunciation mining☆321Updated this week
- 🙊 software for creating speech recognition models.☆152Updated 5 months ago
- Automatically constructing corpus for automatic speech recognition from YouTube videos☆153Updated 4 years ago
- Data and code for grapheme-to-phoneme transducers in lots of languages☆130Updated 7 months ago
- Crawler for linguistic corpora☆192Updated 11 months ago
- CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages☆465Updated 4 years ago
- Universal Romanizer that can convert any unicode script to roman (latin) script☆154Updated 3 months ago
- DeepSpeech based forced alignment tool☆235Updated 3 years ago
- A module for normalising text.☆173Updated 3 years ago
- Command line tool to create corpora for Common Voice☆75Updated 5 months ago
- Tool for creation, manipulation and maintenance of voice corpora☆81Updated 6 months ago
- dataset for lightly supervised training using the librivox audio book recordings. https://librivox.org/.☆480Updated last year
- Speaker diarization scripts, based on AaltoASR☆190Updated 5 years ago
- Helsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text☆235Updated 5 years ago
- PyTorch code for end-to-end spoken language understanding (SLU) with ASR-based transfer learning☆225Updated 3 years ago
- Server framework for Kaldi ASR Toolkit☆97Updated last year
- Small language toolkit for creation, interpolation and pruning of ARPA language models☆90Updated 2 years ago
- A list of publically available audio data that anyone can download for ASR or other speech activities☆200Updated 3 years ago
- Covering grammars for English and Russian text normalization☆60Updated 5 years ago
- Grapheme To Phoneme☆70Updated 3 months ago
- Grapheme to phoneme conversion with deep learning.☆358Updated 11 months ago
- g2p: English Grapheme To Phoneme Conversion☆812Updated last year
- Linguistic processing for Common Voice☆52Updated 10 months ago