GiantAILab / DeepDubber-V1Links
DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance
☆22Updated this week
Alternatives and similar repositories for DeepDubber-V1
Users that are interested in DeepDubber-V1 are comparing it to the libraries listed below
Sorting:
- ☆22Updated last week
- [CVPR 2025] Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation☆39Updated this week
- This repository contains the code for our ICML 2025 paper——LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection🎉☆22Updated 3 weeks ago
- [CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-…☆39Updated 2 months ago
- TokenBridge: Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation. https://yuqingwang1029.github.io/TokenBridge☆119Updated last month
- OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Rea…☆62Updated 3 weeks ago
- 📖 This is a repository for organizing papers, codes, and other resources related to personalized video generation and editing.☆37Updated this week
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆85Updated 3 weeks ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆26Updated 6 months ago
- 🚀 Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models☆28Updated last month
- ☆28Updated last month
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆25Updated 2 months ago
- ☆32Updated 3 weeks ago
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"☆28Updated 3 months ago
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆22Updated 10 months ago
- Official implementation of "JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization"☆68Updated 2 months ago
- ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer☆34Updated 5 months ago
- Demo page of TAVGBench: Benchmarking Text to Audible-Video Generation☆13Updated 2 months ago
- Official Implementation of "Open-Vocabulary Audio-Visual Semantic Segmentation" [ACM MM 2024 Oral].☆29Updated 7 months ago
- [CVPR 2025] Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation☆47Updated 3 weeks ago
- This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels☆36Updated last week
- [ICML2025] Official Code of From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection☆14Updated this week
- Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)☆54Updated 3 months ago
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆70Updated 10 months ago
- [CVPR] MergeVQ: A Unified Framework for Visual Generation and Representation with Token Merging and Quantization☆31Updated this week
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆69Updated 7 months ago
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cues☆68Updated 2 weeks ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆169Updated last week
- 🚀 Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models☆23Updated 2 weeks ago
- [EMNLP 2024 Main] MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension☆14Updated 5 months ago