dataset for lightly supervised training using the librivox audio book recordings. https://librivox.org/.
☆521Jul 11, 2023Updated 2 years ago
Alternatives and similar repositories for libri-light
Users that are interested in libri-light are comparing it to the libraries listed below
Sorting:
- A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation☆564Apr 2, 2023Updated 2 years ago
- Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context☆214Sep 10, 2024Updated last year
- An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.☆368Oct 12, 2021Updated 4 years ago
- Self-Supervised Speech Pre-training and Representation Learning Toolkit☆2,530Jun 13, 2025Updated 8 months ago
- A library for speech data augmentation in time-domain☆683Aug 30, 2021Updated 4 years ago
- Large, modern dataset for speech recognition☆721Feb 26, 2024Updated 2 years ago
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆152Sep 14, 2023Updated 2 years ago
- UniSpeech - Large Scale Self-Supervised Learning for Speech☆479Apr 5, 2024Updated last year
- This is the GitHub page for publicly available emotional speech data.☆381Jan 6, 2022Updated 4 years ago
- Tools for handling multimodal data in machine learning projects.☆1,114Updated this week
- An Open-source Streaming High-fidelity Neural Audio Codec☆498Mar 4, 2025Updated 11 months ago
- 💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies☆1,386Jun 6, 2024Updated last year
- This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples a…☆647Jun 9, 2024Updated last year
- ☆276Jan 15, 2021Updated 5 years ago
- LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning☆159Jun 13, 2024Updated last year
- Autoregressive Predictive Coding: An unsupervised autoregressive model for speech representation learning☆191Jan 29, 2020Updated 6 years ago
- ☆390Sep 3, 2024Updated last year
- Audio Codec Speech processing Universal PERformance Benchmark☆297Jan 8, 2026Updated last month
- CoVoST: A Large-Scale Multilingual Speech-To-Text Translation Corpus (CC0 Licensed)☆396Sep 14, 2021Updated 4 years ago
- g2p: English Grapheme To Phoneme Conversion☆911Jan 5, 2023Updated 3 years ago
- List of speech synthesis papers.☆1,067Jul 24, 2023Updated 2 years ago
- ☆25Mar 12, 2022Updated 3 years ago
- [ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations☆142Apr 27, 2024Updated last year
- End-to-end ASR/LM implementation with PyTorch☆594Aug 30, 2021Updated 4 years ago
- HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis☆2,320Jul 27, 2024Updated last year
- Official implementation of the source-filter HiFiGAN vocoder☆268Jul 29, 2023Updated 2 years ago
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one☆26Aug 5, 2024Updated last year
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆62Nov 1, 2024Updated last year
- Training code for FAcodec presented in NaturalSpeech3☆239Aug 26, 2024Updated last year
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆78Nov 1, 2024Updated last year
- ☆259May 15, 2023Updated 2 years ago
- An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-S…☆415Aug 29, 2023Updated 2 years ago
- ACM MM 2023 CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model☆213Apr 26, 2024Updated last year
- [ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"☆367Sep 3, 2024Updated last year
- Unofficial implementation of NVIDIA P-Flow TTS paper☆230Dec 24, 2024Updated last year
- A Generative Flow for Text-to-Speech via Monotonic Alignment Search☆702Jul 12, 2022Updated 3 years ago
- State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.☆1,718Jan 26, 2026Updated last month
- UT-Sarulab MOS prediction system using SSL models☆296Apr 11, 2024Updated last year
- Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch☆1,638Apr 22, 2024Updated last year