chaxjli / U-MARVELLinks
☆24Updated 2 months ago
Alternatives and similar repositories for U-MARVEL
Users that are interested in U-MARVEL are comparing it to the libraries listed below
Sorting:
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆24Updated 6 months ago
- Official repository of MMDU dataset☆95Updated last year
- [CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant☆161Updated 3 months ago
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency☆53Updated 4 months ago
- R1-like Video-LLM for Temporal Grounding☆117Updated 3 months ago
- ☆37Updated last year
- ☆25Updated last year
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆118Updated 2 months ago
- ☆153Updated 11 months ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆75Updated 6 months ago
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆126Updated last month
- ☆25Updated 3 months ago
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆81Updated 3 weeks ago
- ☆108Updated last week
- ☆80Updated 10 months ago
- [NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding☆51Updated 3 months ago
- Official implement of MIA-DPO☆66Updated 8 months ago
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆42Updated last year
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆73Updated last year
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability☆98Updated 10 months ago
- LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning☆66Updated 4 months ago
- This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehens…☆73Updated 5 months ago
- ☆76Updated last year
- The Next Step Forward in Multimodal LLM Alignment☆181Updated 5 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆48Updated 6 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆86Updated last year
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆128Updated 4 months ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆122Updated 6 months ago
- 【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"☆19Updated last year
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆37Updated 10 months ago