HumanMLLM / CoGenAVLinks
☆44Updated last month
Alternatives and similar repositories for CoGenAV
Users that are interested in CoGenAV are comparing it to the libraries listed below
Sorting:
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Her☆42Updated 2 months ago
- ☆61Updated last week
- ☆33Updated 2 months ago
- ViSpeR: Multilingual Audio-Visual Speech Recognition☆39Updated 2 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆37Updated last year
- [AAAI 2025] VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization☆50Updated 6 months ago
- INTERSPEECH2023: Target Active Speaker Detection with Audio-visual Cues☆52Updated 2 years ago
- Code for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction (ACL24))☆45Updated 10 months ago
- [Interspeech 2024] SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization☆53Updated 2 months ago
- ☆72Updated last month
- Official source codes for the paper: EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing.☆18Updated 3 weeks ago
- [NCMMSC'2024] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech☆22Updated 10 months ago
- small audio language model for reasoning☆64Updated 2 months ago
- ☆13Updated last year
- Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".☆24Updated 11 months ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆73Updated 7 months ago
- A Tiny Project For ASR model training and Deployment☆27Updated 2 years ago
- ☆19Updated 5 months ago
- Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)☆63Updated 4 months ago
- Official repository for the paper Multimodal Transformer Distillation for Audio-Visual Synchronization (ICASSP 2024).☆24Updated last year
- code repo for LoCoNet: Long-Short Context Network for Active Speaker Detection☆36Updated 2 years ago
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cues☆67Updated 2 weeks ago
- ☆11Updated 4 months ago
- [Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Units☆40Updated 7 months ago
- The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation☆40Updated last month
- Code for "SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces" ACM MM 2023☆30Updated last year
- The official implement of VITA, VITA15, LongVITA, and VITA-Audio.☆32Updated last month
- ☆17Updated 2 years ago
- BLSP-Emo: Towards Empathetic Large Speech-Language Models☆46Updated last year
- Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis☆39Updated last year