shivammehta25 / Diff-TTSG
Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
☆39Updated last year
Alternatives and similar repositories for Diff-TTSG:
Users that are interested in Diff-TTSG are comparing it to the libraries listed below
- Facestar dataset. High quality audio-visual recordings of human conversational speech.☆106Updated 3 years ago
- Code for "SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces" ACM MM 2023☆30Updated last year
- E2E TTS using Conditional Flow Matching (Experimental*)☆69Updated last year
- The project page repo for Neural Dubber.☆29Updated last year
- ☆54Updated last year
- Pushing the Limits of Zero-shot End-to-End Speech Translation☆25Updated 3 months ago
- Official repository for the paper Multimodal Transformer Distillation for Audio-Visual Synchronization (ICASSP 2024).☆24Updated last year
- This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.☆47Updated 2 weeks ago
- Project page for "Improving Few-shot Learning for Talking Face System with TTS Data Augmentation" for ICASSP2023☆85Updated last year
- ☆25Updated 8 months ago
- Implementation of Multi-Source Music Generation with Latent Diffusion.☆22Updated 6 months ago
- small audio language model for reasoning☆50Updated last week
- ☆35Updated 11 months ago
- Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)☆56Updated last month
- X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion☆85Updated last year
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆64Updated 5 months ago
- An official implementation of Style-Talker for Spoken Dialogue Generation☆17Updated 2 months ago
- Temporary anonymous version☆22Updated last year
- Codebase and project page for EDMSound☆34Updated last year
- ☆22Updated 2 months ago
- Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS☆37Updated last year
- ☆11Updated 8 months ago
- ☆25Updated 2 years ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆84Updated 3 months ago
- Audio Demo for "FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation"☆20Updated 3 years ago
- VAE modified from Descript Audio Codec, which replaces the RVQ with VAE☆67Updated last year
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)☆32Updated 4 months ago
- ☆19Updated 3 years ago
- Official release of StyleTalk dataset.☆62Updated 9 months ago
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency☆51Updated 5 months ago