☆80Feb 24, 2026Updated last week
Alternatives and similar repositories for MingTok-Audio
Users that are interested in MingTok-Audio are comparing it to the libraries listed below
Sorting:
- Official implementation of paper: Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-…☆40Sep 18, 2024Updated last year
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated 10 months ago
- PASE: Phonologically Anchored Speech Enhancer☆38Dec 10, 2025Updated 2 months ago
- Cross-Speaker Encoding Network for Multi-talker Speech Recognition☆11Mar 14, 2025Updated 11 months ago
- Spherical residual vector quantization (SRVQ)☆31Aug 25, 2024Updated last year
- Official code for SongEcho☆41Feb 21, 2026Updated last week
- Implementation of the paper "Self-supervised Learning with Random-projection Quantizer for Speech Recognition" in Pytorch.☆91May 25, 2023Updated 2 years ago
- Evaluation tool used in the BigVSAN paper☆14Mar 22, 2024Updated last year
- NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment☆16Apr 13, 2022Updated 3 years ago
- The implementation of paper "SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody"☆34Nov 23, 2023Updated 2 years ago
- Official Repository of UltraVoice☆58Oct 28, 2025Updated 4 months ago
- Code of our ISMIR 2025 paper - D. Afchar, G. Meseguer Brocal, K. Akesbi, R. Hennequin☆34Nov 12, 2025Updated 3 months ago
- 🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨☆17May 20, 2025Updated 9 months ago
- ☆18Jan 17, 2022Updated 4 years ago
- Official Implementation for the paper: A Variational Framework for Improving Naturalness in Generative Spoken Language Models☆22Jun 18, 2025Updated 8 months ago
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆43Mar 3, 2025Updated last year
- The official code for the SALMon🍣 benchmark (ICASSP 2025 - Oral)☆49Aug 15, 2025Updated 6 months ago
- [EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers☆125Mar 20, 2025Updated 11 months ago
- Official Code Implementation for 'A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models'☆20Jul 24, 2024Updated last year
- babyLM WhisBERT code☆19May 27, 2024Updated last year
- Speech enhancement in noisy and reverberant environments using deep neural networks☆22Oct 10, 2025Updated 4 months ago
- [EMNLP 2025 Findings] Official code for EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion☆34Sep 9, 2025Updated 5 months ago
- ☆24Jun 13, 2022Updated 3 years ago
- Implementation for paper "Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE"☆44Apr 10, 2023Updated 2 years ago
- ☆45Jun 11, 2024Updated last year
- ☆49Apr 1, 2025Updated 11 months ago
- Implementation of multi-level Contrastive Predictive Coding (CPC) methods☆20Jan 12, 2023Updated 3 years ago
- This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.☆51Mar 17, 2025Updated 11 months ago
- Variable Bitrate Residual Vector Quantization for Audio Coding☆51May 1, 2025Updated 10 months ago
- ☆24Mar 13, 2020Updated 5 years ago
- Repository for the paper "Combining audio control and style transfer using latent diffusion", accepted at ISMIR 2024☆63Feb 19, 2025Updated last year
- Pytorch implementation of SoundCTM☆101Mar 31, 2025Updated 11 months ago
- Command-line interface and Python library to transcribe pinyin to IPA. The tones are attached to the vowel of the syllable.☆53Apr 16, 2025Updated 10 months ago
- Official implemention for Diffusion Models Are Innate One-Step Generators☆26Jun 25, 2025Updated 8 months ago
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one☆26Aug 5, 2024Updated last year
- Baseline for DCASE 2024 Task 9: "Language-Queried Audio Source Separation"☆26Mar 27, 2024Updated last year
- Simple and lightweight Zero-shot Text-to-Speech (TTS) synthesis model☆36Apr 29, 2025Updated 10 months ago
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆101Jul 24, 2024Updated last year
- Official implementation of our paper "Bidirectional Consistency Models"; and reproduced Improved Consistency Models (iCT).☆27May 10, 2025Updated 9 months ago