[ICASSP2025] Official code for VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis
☆52Apr 9, 2025Updated 11 months ago
Alternatives and similar repositories for VoiceDiT
Users that are interested in VoiceDiT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [INTERSPEECH 2024] Official code for VoxSim: A perceptual voice similarity dataset☆12Sep 29, 2025Updated 6 months ago
- Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis☆27Mar 21, 2025Updated last year
- [ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching☆45Feb 9, 2025Updated last year
- [ICASSP 2024] Official code for FreGrad☆35May 13, 2024Updated last year
- ☆20Apr 18, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- VoiceLDM: Text-to-Speech with Environmental Context☆192Aug 9, 2024Updated last year
- Unofficial implementation of ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech☆21Feb 9, 2025Updated last year
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)☆43Jun 13, 2024Updated last year
- This is the official repository of ``Scalable Neural Vocoder from Range-Null Space Decomposition'', which is submitted to TPAMI.☆47Oct 11, 2025Updated 5 months ago
- ☆24Jul 15, 2024Updated last year
- Feed-forward compressor experiments source code for "Differentiable All-pole Filters for Time-varying Audio Systems".☆22Jun 10, 2024Updated last year
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)☆59Apr 3, 2025Updated 11 months ago
- Pytorch implementation of SoundCTM☆101Mar 31, 2025Updated 11 months ago
- Official implementation of paper: Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis☆51Sep 20, 2025Updated 6 months ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- Code for the paper "JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis"☆14Nov 5, 2024Updated last year
- ☆15Nov 11, 2024Updated last year
- Self-supervised Generative LM-based Voice Conversion☆55Apr 24, 2025Updated 11 months ago
- PitchVC: Pitch Conditioned Any-to-Many Voice Conversion☆36Jun 6, 2024Updated last year
- [INTERSPEECH 2024] Official pytorch code for the paper "Disentangled Representation Learning for Environment-agnostic Speaker Recognition…☆18Jul 23, 2024Updated last year
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆152Sep 14, 2023Updated 2 years ago
- [WIP]Direction based Multi-Channel Speech Separation☆14Jan 25, 2024Updated 2 years ago
- ☆67Aug 16, 2023Updated 2 years ago
- The source code for the paper CrossSinger (asru2023)☆18Oct 12, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Unofficial pytorch implementation of VISinger: Variational Inference with Adversarial Learning for End-to-end Singing Voice Synthesis (IC…☆20May 12, 2023Updated 2 years ago
- UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts☆42Jun 12, 2025Updated 9 months ago
- Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction☆180Apr 15, 2025Updated 11 months ago
- Please visit https://thuhcsi.github.io/SnakeGAN/☆37Apr 25, 2023Updated 2 years ago
- DiffPhase: Generative Diffusion-based STFT Phase Retrieval☆16Sep 21, 2023Updated 2 years ago
- ☆19Mar 22, 2024Updated 2 years ago
- (ICASSP 2025) Learning Source Disentanglement in Neural Audio Codec☆47May 16, 2025Updated 10 months ago
- ☆29Mar 28, 2024Updated 2 years ago
- Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation☆139Mar 8, 2026Updated 3 weeks ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-To-Speech Using Natural Language Descriptions☆85Oct 11, 2024Updated last year
- VI-SVC model is just VITS without MAS and DurationPredictor.☆10Nov 9, 2023Updated 2 years ago
- Testing sets for semanticVAD☆20Feb 18, 2025Updated last year
- ☆61Oct 28, 2024Updated last year
- Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)☆94Dec 3, 2024Updated last year
- Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generati…☆193Mar 25, 2024Updated 2 years ago
- ☆25Aug 29, 2025Updated 7 months ago