ZZDoog / AvatarLinks
Avatar: An easy-to-use digital portrait PPT presentation video generation system based on Gradio
☆20Updated last year
Alternatives and similar repositories for Avatar
Users that are interested in Avatar are comparing it to the libraries listed below
Sorting:
- ☆31Updated last year
- NeurIPS'2023 official implementation code☆65Updated last year
- [CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners☆148Updated last year
- LAVIS - A One-stop Library for Language-Vision Intelligence☆48Updated last year
- A curated list of resources in audio visual question answering and related area. :-)☆12Updated 2 months ago
- A toolkit for computing Fréchet Inception Distance (FID) & Fréchet Video Distance (FVD) metrics.☆36Updated 3 months ago
- Official source codes for the paper: EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing.☆27Updated 2 months ago
- Official implementation for CIGN☆16Updated last year
- Multimodal Empathetic Chatbot☆42Updated last year
- [CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie…☆21Updated 2 months ago
- Official code for WACV 2024 paper, "Annotation-free Audio-Visual Segmentation"☆33Updated 10 months ago
- Demo page of TAVGBench: Benchmarking Text to Audible-Video Generation☆13Updated 4 months ago
- [ICML 2025 Spotlight] MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding☆54Updated last month
- ☆36Updated 4 months ago
- Vision Transformers are Parameter-Efficient Audio-Visual Learners☆103Updated 2 years ago
- [ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…☆55Updated 11 months ago
- AI-Generated Images as Data Source: The Dawn of Synthetic Era☆154Updated last year
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 11 months ago
- A list of current Audio-Vision Multimodal with awesome resources (paper, application, data, review, survey, etc.).☆24Updated last year
- LMM solved catastrophic forgetting, AAAI2025☆44Updated 4 months ago
- Precision Search through Multi-Style Inputs☆72Updated last month
- [ACM MM24] Official implementation of paper "From Speaker to Dubber: Movie Dubbing with Prosody and Duration Consistency Learning"☆28Updated 3 months ago
- Video dataset dedicated to portrait-mode video recognition.☆52Updated 8 months ago
- [IJCV 2025] Code for DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection☆55Updated 8 months ago
- The official SpeakerVid-5M data curation code.☆34Updated last month
- A curated list of Text-to-Video Generation papers and BibTeX entries☆21Updated last year
- ☆55Updated 2 months ago
- video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is d…☆53Updated last week
- ☆92Updated 5 months ago
- Unified Audio-Visual Perception for Multi-Task Video Localization☆27Updated last year