linkangheng / Video-UTRLinks

[ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs

☆61

Alternatives and similar repositories for Video-UTR

Users that are interested in Video-UTR are comparing it to the libraries listed below

Sorting:

appletea233 / Temporal-R1
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
☆58Updated 5 months ago
hshjerry / VideoEspresso
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆127Updated 3 months ago
PolyU-ChenLab / ETBench
👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)
☆70Updated 10 months ago
Haochen-Wang409 / ross
[ICLR'25] Reconstructive Visual Instruction Tuning
☆125Updated 7 months ago
Open-Reasoner-Zero / Open-Vision-Reasoner
[NeurIPS 2025] The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reason…
☆144Updated 2 months ago
z-x-yang / DoraemonGPT
Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models
☆88Updated last year
Hui-design / Open-LLaVA-Video-R1
[LLaVA-Video-R1]✨First Adaptation of R1 to LLaVA-Video (2025-03-18)
☆35Updated 6 months ago
www-Ye / Time-R1
R1-like Video-LLM for Temporal Grounding
☆124Updated 5 months ago
JoeLeelyf / OVO-Bench
[CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
☆103Updated 3 months ago
hmxiong / StreamChat
Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025
☆85Updated 8 months ago
OuyangKun10 / SpaceR
SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning
☆97Updated 4 months ago
Video-R1 / Awesome-Multimodal-Reasoning
Collections of Papers and Projects for Multimodal Reasoning.
☆105Updated 6 months ago
xiaomi-research / time-r1
[NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding
☆61Updated last month
zhang9302002 / ThinkingWithVideos
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
☆64Updated last month
OpenGVLab / VideoChat-R1
[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
☆227Updated last month
yongliang-wu / NumPro
[CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga
☆126Updated last month
showlab / VideoLISA
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
☆140Updated 10 months ago
ChenYi99 / EgoPlan
[IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
☆74Updated 11 months ago
RupertLuo / VoCoT
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
☆75Updated last year
Becomebright / ReKV
Official PyTorch Code of ReKV (ICLR'25)
☆66Updated 2 weeks ago
JierunChen / Ref-L4
Evaluation code for Ref-L4, a new REC benchmark in the LMM era
☆51Updated 10 months ago
shilinyan99 / CrossLMM
CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms
☆25Updated 5 months ago
appletea233 / LLaVA-ST
[CVPR 2025] LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding
☆77Updated 4 months ago
mll-lab-nu / TStar
TStar is a unified temporal search framework for long-form video question answering
☆71Updated 2 months ago
mu-cai / TemporalBench
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
☆37Updated last year
JoseponLee / IntentQA
Official repository for "IntentQA: Context-aware Video Intent Reasoning" from ICCV 2023.
☆21Updated 11 months ago
Ahnsun / merlin
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
☆95Updated last year
Visual-AI / PruneVid
[ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Models
☆55Updated 6 months ago
PKU-YuanGroup / Look-Back
This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".
☆69Updated 4 months ago
TencentARC / SEED-Bench-R1
☆94Updated 4 months ago