shivammehta25 / Diff-TTSG
Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
☆39Updated last year
Alternatives and similar repositories for Diff-TTSG:
Users that are interested in Diff-TTSG are comparing it to the libraries listed below
- trying to reproduce suno v3☆27Updated this week
- ☆11Updated 6 months ago
- ☆34Updated 9 months ago
- E2E TTS using Conditional Flow Matching (Experimental*)☆69Updated last year
- Pushing the Limits of Zero-shot End-to-End Speech Translation☆25Updated last month
- ☆25Updated 5 months ago
- Facestar dataset. High quality audio-visual recordings of human conversational speech.☆105Updated 2 years ago
- Implementation of Multi-Source Music Generation with Latent Diffusion.☆21Updated 4 months ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆74Updated last month
- ☆25Updated 2 years ago
- An AR+AR TTS attempt.☆13Updated 2 weeks ago
- X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion☆79Updated 9 months ago
- Official release of StyleTalk dataset.☆60Updated 6 months ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆60Updated 2 months ago
- Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)☆41Updated 9 months ago
- DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code☆10Updated 2 years ago
- Project page for "Improving Few-shot Learning for Talking Face System with TTS Data Augmentation" for ICASSP2023☆85Updated last year
- ☆50Updated last year
- GPT-style network for phonemization with durations of text☆64Updated 10 months ago
- BLSP-Emo: Towards Empathetic Large Speech-Language Models☆42Updated 7 months ago
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆91Updated 3 months ago
- ☆37Updated 2 weeks ago
- VoiceBank-2023 is the speech corpus specially designed for constructing personalized Mandarin text-to-speech (TTS) systems.☆39Updated last year
- ☆20Updated last week
- ESLTTS dataset☆16Updated last week
- ☆18Updated 8 months ago
- Zero-Shot Emotion Style Transfer☆41Updated 9 months ago
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)☆27Updated 2 months ago
- GPT for FACodec☆13Updated 10 months ago
- ☆33Updated 2 months ago