ag1988 / mel-asrView external linksLinks
The accompanying code for "Exploring the limits of decoder-only models trained on public speech recognition corpora" (Ankit Gupta, George Saon, Brian Kingsbury. Interspeech 2024).
☆20Oct 11, 2024Updated last year
Alternatives and similar repositories for mel-asr
Users that are interested in mel-asr are comparing it to the libraries listed below
Sorting:
- Interface Design for Self-Supervised Speech Models, Accepted to Interspeech2024☆16Nov 19, 2024Updated last year
- ☆13Sep 25, 2024Updated last year
- ☆19Mar 22, 2024Updated last year
- ☆46Apr 16, 2023Updated 2 years ago
- SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.☆13Jun 2, 2023Updated 2 years ago
- (Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT☆40Aug 29, 2024Updated last year
- ESLTTS dataset☆16Feb 6, 2025Updated last year
- Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11…☆46Jul 2, 2024Updated last year
- An AR+AR TTS attempt.☆18Jan 13, 2025Updated last year
- Generate audio datasets for training Text-To-Speech models, through smart audio splitting with silence detection, and transcription using…☆30May 27, 2023Updated 2 years ago
- ☆38Apr 15, 2024Updated last year
- Text-To-Speech for NotebookLM☆37Jul 20, 2025Updated 6 months ago
- Code repository for the paper "Improving End-to-End SLU performance with Prosodic Attention and Distillation" accepted at Interspeech 202…☆27May 17, 2023Updated 2 years ago
- ☆37Jul 4, 2024Updated last year
- ☆36Mar 14, 2025Updated 11 months ago
- A pitch detection model trained to be robust against noise and reverberation environments.☆27Jan 21, 2025Updated last year
- (R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.☆48Sep 4, 2023Updated 2 years ago
- Implementation for paper "Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE"☆44Apr 10, 2023Updated 2 years ago
- ☆54Jul 16, 2025Updated 7 months ago
- Project for HIDING SPEAKER’S SEX IN SPEECH USING ZERO-EVIDENCE SPEAKER REPRESENTATION IN AN ANALYSIS/SYNTHESIS PIPELINE☆15Nov 30, 2022Updated 3 years ago
- [INTERSPEECH 2024] Official code for VoxSim: A perceptual voice similarity dataset☆12Sep 29, 2025Updated 4 months ago
- ☆11Nov 7, 2024Updated last year
- This is not remotely close to a finished product, and does not intend to nor does this claim to be working fine-tuning code for MaskGCT. …☆13Dec 4, 2024Updated last year
- High-performance tokenized language data-loader for Python C++ extension☆14Jul 22, 2024Updated last year
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated 10 months ago
- Repository for reproducing result in journal "Self-supervised learning for Speech Emotion Recognition"☆10Mar 15, 2023Updated 2 years ago
- ☆10Apr 17, 2024Updated last year
- ☆13Oct 25, 2024Updated last year
- 《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》☆77Jun 9, 2023Updated 2 years ago
- ☆25Mar 12, 2022Updated 3 years ago
- Just another FastSpeech 2 but cleaner code :)☆29Jun 28, 2024Updated last year
- ☆44Sep 19, 2024Updated last year
- Project of Singing Voice Conversion.☆16Oct 27, 2023Updated 2 years ago
- Pybind11 bindings for Kaldi☆15Feb 1, 2026Updated 2 weeks ago
- Cross-Speaker Encoding Network for Multi-talker Speech Recognition☆11Mar 14, 2025Updated 11 months ago
- ☆13Oct 11, 2024Updated last year
- ☆13Dec 15, 2025Updated 2 months ago
- This is an extension of kaldi speech recognition software which allows to perform decoding of speech with hybrid word and phoneme graphs.…☆11Feb 4, 2020Updated 6 years ago
- DPDFNet: causal single-channel speech enhancement that boosts DeepFilterNet2 with dual-path RNN blocks for stronger long-range temporal a…☆30Updated this week