vskadandale / vocalistView external linksLinks
Official repository for the paper VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices
☆73Apr 7, 2024Updated last year
Alternatives and similar repositories for vocalist
Users that are interested in vocalist are comparing it to the libraries listed below
Sorting:
- Official repository for the paper Multimodal Transformer Distillation for Audio-Visual Synchronization (ICASSP 2024).☆28Apr 3, 2024Updated last year
- deep-learning based audio-visual lip bometrics☆15May 9, 2023Updated 2 years ago
- Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)☆54Jan 29, 2024Updated 2 years ago
- ☆428Nov 1, 2023Updated 2 years ago
- Textless Speech-to-Music Retrieval Using Emotion Similarity [ICASSP23]☆17Aug 16, 2023Updated 2 years ago
- PyTorch implementation of "StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator"☆214Aug 8, 2023Updated 2 years ago
- Out of time: automated lip sync in the wild☆870Jan 23, 2024Updated 2 years ago
- This is the release code for CVPR2022 paper "Voice-Face Homogeneity Tells Deepfake".☆15Mar 7, 2022Updated 3 years ago
- [ECCV 2022] StyleHEAT: A framework for high-resolution editable talking face generation☆658Mar 26, 2023Updated 2 years ago
- Parallel and High-Fidelity Text-to-Lip Generation; AAAI 2022 ; Official code☆109May 1, 2022Updated 3 years ago
- Unoffical LivePortrait Training Script [ 🚧 Under Construction]☆38Jan 28, 2025Updated last year
- Official pytorch implementation for Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion (CVPR 2022)☆126Aug 18, 2024Updated last year
- Official implementation of Transpotter, published in BMVC 2021☆16Aug 6, 2022Updated 3 years ago
- [ICME 2025] DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation☆24Mar 25, 2025Updated 10 months ago
- Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21☆16May 14, 2022Updated 3 years ago
- ☆526Dec 26, 2023Updated 2 years ago
- Disentangled Speech Embeddings using Cross-Modal Self-Supervision☆166Apr 12, 2020Updated 5 years ago
- A self-supervised learning framework for audio-visual speech☆969Dec 7, 2023Updated 2 years ago
- ☆38Apr 15, 2024Updated last year
- FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.☆383Jun 30, 2022Updated 3 years ago
- Audio-Visual Speech Recognition using Sequence to Sequence Models☆83Jul 10, 2020Updated 5 years ago
- ☆24Feb 20, 2024Updated last year
- The MAVD represents Mandarin Audio-Visual dataset with Depth information. MAVD has a rich variety of modal data, including audio, RGB ima…☆20Apr 22, 2024Updated last year
- ☆38Nov 10, 2024Updated last year
- Official Pytorch Implementation of SPECTRE: Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos☆293Mar 24, 2025Updated 10 months ago
- Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling (Accepted by AAAI'2024)☆59Jun 20, 2024Updated last year
- JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech☆113Jun 6, 2022Updated 3 years ago
- Code for "Learning Diverse Stochastic Human-Action Generators by Learning Smooth Latent Transitions"☆22Dec 24, 2019Updated 6 years ago
- ☆21Mar 31, 2022Updated 3 years ago
- GPT-style network for phonemization with durations of text☆68Mar 21, 2024Updated last year
- [NeurIPS 2024] This is the official repo of the paper "Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Li…☆135Feb 9, 2025Updated last year
- Official Implement of Multi-Stage Multi-Codebook (MSMC) TTS☆167Apr 10, 2024Updated last year
- ☆101Oct 30, 2025Updated 3 months ago
- SOTA Piano Transformer model trained on 4.2GB of Solo Piano MIDI music☆27Nov 9, 2023Updated 2 years ago
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one☆26Aug 5, 2024Updated last year
- Code for paper 'EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model'☆201Apr 28, 2023Updated 2 years ago
- Project of "Adaptive Affine Transformation: A Simple and Effective Operation for Spatial Misaligned Image Generation"☆64Mar 1, 2023Updated 2 years ago
- ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASS…☆431May 18, 2023Updated 2 years ago
- Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge☆21Jul 25, 2022Updated 3 years ago