[ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
☆56Jul 1, 2025Updated 8 months ago
Alternatives and similar repositories for CREMA
Users that are interested in CREMA are comparing it to the libraries listed below
Sorting:
- Code for "VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement"☆52Dec 5, 2024Updated last year
- [TCSVT] Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection☆17Jul 22, 2023Updated 2 years ago
- Official Implementation for "SiLVR : A Simple Language-based Video Reasoning Framework"☆19Jan 18, 2026Updated last month
- [NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering☆196Jan 14, 2024Updated 2 years ago
- [NeurIPS 2024] Mixture of Experts for Audio-Visual Learning☆23Jan 19, 2025Updated last year
- ☆26Jun 20, 2024Updated last year
- [ICLR 2025] SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image and Video Generation☆54Jan 22, 2025Updated last year
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆19Jun 27, 2024Updated last year
- ☆30May 9, 2024Updated last year
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆27Jan 17, 2026Updated last month
- Official implementation of "ControlFace: Harnessing Facial Parametric Control for Face Rigging".☆42Mar 5, 2025Updated 11 months ago
- Official implementation of "VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis"☆20Jan 26, 2025Updated last year
- (EMNLP 2025 Main) RACCooN: A Versatile Instructional Video Editing Framework with Auto-Generated Narratives☆37Dec 20, 2025Updated 2 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆83Jul 1, 2024Updated last year
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆45Nov 29, 2023Updated 2 years ago
- ☆25Mar 30, 2025Updated 11 months ago
- Official PyTorch code of GroundVQA (CVPR'24)☆64Sep 13, 2024Updated last year
- ☆20Apr 26, 2024Updated last year
- [AAAI 2026] Official implementation of DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation☆78Jun 11, 2025Updated 8 months ago
- This is the official repository for "LatentMan: Generating Consistent Animated Characters using Image Diffusion Models" [CVPRW 2024]☆22Jul 21, 2024Updated last year
- Official implementation of "A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives", accepted at CVPR 2…☆24Jun 13, 2024Updated last year
- [CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"☆12Feb 27, 2024Updated 2 years ago
- Codebase for the paper HawkI: HawkI: Homography & Mutual Information Guidance for 3D-free Single Image to Aerial View☆13Jun 5, 2024Updated last year
- Qwen-SAM is a reasoning-based segmentation model that integrates Qwen 2.5 VL 7B with the Segment Anything Model (SAM), enabling fine-grai…☆24Jun 4, 2025Updated 8 months ago
- ☆19Apr 23, 2025Updated 10 months ago
- Implementation for "Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffu…☆13Sep 8, 2023Updated 2 years ago
- [CVPR'2024]: UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion☆53Nov 23, 2024Updated last year
- ☆22Sep 16, 2025Updated 5 months ago
- ☆12Dec 15, 2023Updated 2 years ago
- ☆11Jun 28, 2024Updated last year
- ☆12Jan 25, 2024Updated 2 years ago
- This is the official code for the paper "Reconstruct before Query: Continual Missing Modality Learning with Decomposed Prompt Collaborati…☆12Aug 13, 2024Updated last year
- [CVPR2025] Event Ellipsometer: Event-based Mueller-Matrix Video Imaging☆11Apr 7, 2025Updated 10 months ago
- Skybox previewer and generator using BlockadeLabs☆15May 13, 2023Updated 2 years ago
- [ECCV 2024] Official code for: SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer☆112Jun 30, 2025Updated 8 months ago
- Code for "Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning [EMNLP 2025 Finding]"☆15Aug 27, 2025Updated 6 months ago
- Official implementation of CVPR 2024 paper "vid-TLDR: Training Free Token merging for Light-weight Video Transformer".☆55Oct 21, 2025Updated 4 months ago
- Official code for DAM: Dynamic Adapter Merging for Continual Video QA Learning☆14Apr 25, 2024Updated last year
- Official Implementation of Nabla-GFlowNet (ICLR 2025)☆28May 3, 2025Updated 10 months ago