shivammehta25 / Diff-TTSG
Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
☆37Updated last year
Related projects ⓘ
Alternatives and complementary repositories for Diff-TTSG
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.☆51Updated last month
- Project page for "Improving Few-shot Learning for Talking Face System with TTS Data Augmentation" for ICASSP2023☆83Updated last year
- ☆11Updated 4 months ago
- Facestar dataset. High quality audio-visual recordings of human conversational speech.☆104Updated 2 years ago
- Code for "SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces" ACM MM 2023☆30Updated last year
- ☆34Updated 7 months ago
- ☆48Updated last year
- trying to reproduce suno v3☆25Updated 7 months ago
- Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS☆35Updated last year
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆37Updated 2 weeks ago
- Implementation of Multi-Source Music Generation with Latent Diffusion.☆18Updated 2 months ago
- X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion☆70Updated 7 months ago
- [ICLR2022] Code for "Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph"☆53Updated 2 years ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆66Updated last week
- ☆25Updated 3 months ago
- This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.☆42Updated 2 months ago
- ☆36Updated 4 months ago
- E2E TTS using Conditional Flow Matching (Experimental*)☆66Updated last year
- WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching☆26Updated 3 weeks ago
- Codebase and project page for EDMSound☆29Updated last year
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆56Updated 2 weeks ago
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆83Updated 3 weeks ago
- Code for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction (ACL24))☆33Updated 3 months ago
- [ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"☆50Updated last week
- Efficient synchronization from sparse cues☆28Updated 6 months ago
- This repository presents an evaluation framework for speech-to-speech (S2S) models, following the methodology described in the EmphAsses …☆14Updated 10 months ago
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆57Updated 2 months ago
- ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models (TTS)☆10Updated 8 months ago
- Pushing the Limits of Zero-shot End-to-End Speech Translation☆21Updated 3 months ago