IVUL-KAUST/VideoAuto-R1

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/IVUL-KAUST/VideoAuto-R1)

IVUL-KAUST / VideoAuto-R1

[CVPR2026] VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice

☆88

Alternatives and similar repositories for VideoAuto-R1

Users that are interested in VideoAuto-R1 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LaVi-Lab / Rethink_CoT_Video
View on GitHub
Official code for "Rethinking Chain-of-Thought Reasoning for Videos"
☆21Dec 14, 2025Updated 7 months ago
TencentARC / TimeLens
View on GitHub
[CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
☆158Apr 27, 2026Updated 2 months ago
aiha-lab / InfiniPot-V
View on GitHub
[NeurIPS 25] InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
☆20Jan 25, 2026Updated 5 months ago
mbzuai-oryx / Video-R2
View on GitHub
Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models
☆19Jan 21, 2026Updated 5 months ago
InternRobotics / EgoThinker
View on GitHub
Official implementation of EgoThinker at NIPS 2025
☆29Nov 25, 2025Updated 7 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
mbzuai-oryx / Video-CoM
View on GitHub
Video-CoM: Interactive Video Reasoning via Chain of Manipulations
☆22Jun 17, 2026Updated last month
marinero4972 / Open-o3-Video
View on GitHub
[ICML 2026] Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆157May 1, 2026Updated 2 months ago
zhang9302002 / ThinkingWithVideos
View on GitHub
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
☆101Oct 15, 2025Updated 9 months ago
haowei-freesky / HERMES
View on GitHub
Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding" [ACL 2026]
☆92May 8, 2026Updated 2 months ago
tulerfeng / OneThinker
View on GitHub
🔥 OneThinker: All-in-one Reasoning Model for Image and Video [CVPR 2026]
☆463Feb 28, 2026Updated 4 months ago
QiWang98 / VideoRFT
View on GitHub
[NeurIPS 2025] VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning
☆65Jan 6, 2026Updated 6 months ago
yunlong10 / Awesome-Video-LMM-Post-Training
View on GitHub
🔥🔥🔥 Latest Papers, Codes and Datasets on Video-LMM Post-Training
☆296Mar 3, 2026Updated 4 months ago
ShareLab-SII / FluxMem
View on GitHub
[CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding
☆73Mar 16, 2026Updated 4 months ago
yeliudev / VideoMind
View on GitHub
🧠 VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning (ICLR 2026)
☆346Feb 8, 2026Updated 5 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
Fu-Fu-Fu-Fu / VideoKR
View on GitHub
[ICML 26 Spotlight] Code for paper "VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding"
☆19Jun 5, 2026Updated last month
OpenGVLab / VideoChat-R1
View on GitHub
[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
☆268Oct 18, 2025Updated 9 months ago
Jayce1kk / SpaceVLLM
View on GitHub
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability
☆17May 8, 2025Updated last year
OpenGVLab / VKnowU
View on GitHub
[ECCV 2026] VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs
☆15Feb 3, 2026Updated 5 months ago
Hui-design / TSPO
View on GitHub
[AAAI 2026] ✨ TSPO: Temporal Sampling Policy Optimization for Long-form Video Language Understanding
☆131Nov 12, 2025Updated 8 months ago
Ziyang412 / Video-RTS
View on GitHub
Code for EMNLP25 paper "Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning"
☆24Feb 18, 2026Updated 5 months ago
SKYLENAGE-AI / DeepVision-103K
View on GitHub
Codebase for DeepVision-103K
☆22Feb 21, 2026Updated 4 months ago
tulerfeng / Video-R1
View on GitHub
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆879Dec 14, 2025Updated 7 months ago
ThinkMorph / ThinkMorph
View on GitHub
[ICLR 2026] The official repository for paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"
☆190May 1, 2026Updated 2 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
PolyU-ChenLab / ETBench
View on GitHub
👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)
☆74Jan 20, 2025Updated last year
LaVi-Lab / Visual-Table
View on GitHub
[EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"
☆20Oct 17, 2024Updated last year
longmalongma / TW-GRPO
View on GitHub
The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"
☆36Jun 12, 2025Updated last year
EvolvingLMMs-Lab / LongVT
View on GitHub
[CVPR 2026] LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
☆254Jun 24, 2026Updated 3 weeks ago
TencentARC / SEED-Bench-R1
View on GitHub
☆100Jun 23, 2025Updated last year
EvolvingLMMs-Lab / SimpleStream
View on GitHub
A simple video streaming baseline that outperforms SOTAs.
☆148May 1, 2026Updated 2 months ago
Vchitect / Uni-MMMU
View on GitHub
[ACL2026 oral] Uni-MMMU : A Massive Multi-discipline Multimodal Unified Benchmark
☆25Apr 13, 2026Updated 3 months ago
qirui-chen / MultiHop-EgoQA
View on GitHub
[AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
☆38May 27, 2025Updated last year
xiaomi-research / time-r1
View on GitHub
[NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding
☆95Dec 14, 2025Updated 7 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
DripNowhy / Octopus
View on GitHub
[ICML 2026] Official implementation for paper: Learning Self-Correction in Vision–Language Models via Rollout Augmentation
☆16Jun 4, 2026Updated last month
daeunni / Video-Skill-CoT
View on GitHub
Code for "Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning [EMNLP 2025 Findings]"
☆18Aug 27, 2025Updated 10 months ago
FeiElysia / Tempo
View on GitHub
Tempo: Small Vision-Language Models are Smart Compressors for Long Video Understanding (ECCV 2026)
☆76Jun 29, 2026Updated 3 weeks ago
cg1177 / Recursive-Multimodal-Agent
View on GitHub
☆19Jul 1, 2026Updated 2 weeks ago
aim-uofa / Omni-R1
View on GitHub
[NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
☆126Dec 3, 2025Updated 7 months ago
KD-TAO / OmniZip
View on GitHub
[CVPR 2026] OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
☆99Apr 20, 2026Updated 3 months ago
fmu2 / snag_release
View on GitHub
Official Implementation of SnAG (CVPR 2024)
☆59Apr 26, 2025Updated last year