shivammehta25 / Diff-TTSG
Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
☆38Updated last year
Related projects: ⓘ
- Code for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction (ACL24))☆25Updated last month
- ☆44Updated this week
- ☆33Updated 2 months ago
- ☆45Updated last year
- [ICLR2022] Code for "Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph"☆53Updated last year
- Facestar dataset. High quality audio-visual recordings of human conversational speech.☆99Updated 2 years ago
- Project page for "Improving Few-shot Learning for Talking Face System with TTS Data Augmentation" for ICASSP2023☆82Updated 11 months ago
- ☆23Updated last month
- X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion☆64Updated 5 months ago
- VoiceBank-2023 is the speech corpus specially designed for constructing personalized Mandarin text-to-speech (TTS) systems.☆36Updated last year
- Official release of StyleTalk dataset.☆53Updated 2 months ago
- BLSP-Emo: Towards Empathetic Large Speech-Language Models☆33Updated 3 months ago
- ☆33Updated 5 months ago
- Unsupervised Rhythm Modeling for Voice Conversion☆78Updated last year
- Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS☆31Updated last year
- ☆11Updated 2 months ago
- Pushing the Limits of Zero-shot End-to-End Speech Translation☆20Updated last month
- official code for CVPR'24 paper Diff-BGM☆38Updated 5 months ago
- The project page repo for Neural Dubber.☆27Updated 11 months ago
- Code for "SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces" ACM MM 2023☆30Updated last year
- ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models (TTS)☆9Updated 6 months ago
- Codebase and project page for EDMSound☆29Updated 9 months ago
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆68Updated 2 months ago
- ☆14Updated last year
- Efficient synchronization from sparse cues☆25Updated 4 months ago
- ☆20Updated this week
- Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11…☆40Updated 2 months ago
- Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs☆33Updated last week
- Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling (Accepted by AAAI'2024)☆48Updated 3 months ago
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆52Updated 3 weeks ago