VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice
☆63Jan 9, 2026Updated last month
Alternatives and similar repositories for VideoAuto-R1
Users that are interested in VideoAuto-R1 are comparing it to the libraries listed below
Sorting:
- [ICLR 2026] Official repo for "Spotlight on Token Perception for Multimodal Reinforcement Learning"☆49Jan 30, 2026Updated last month
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 7 months ago
- ☆18Jun 10, 2025Updated 8 months ago
- [CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs☆103Updated this week
- Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows☆19Nov 4, 2025Updated 3 months ago
- Official code for "Rethinking Chain-of-Thought Reasoning for Videos"☆20Dec 14, 2025Updated 2 months ago
- OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models☆55Feb 1, 2026Updated 3 weeks ago
- ☆21Feb 13, 2026Updated 2 weeks ago
- Official Repository of paper: "MotionEdit: Benchmarking and Learning Motion-Centric Image Editing"☆59Jan 20, 2026Updated last month
- ☆23Jul 20, 2025Updated 7 months ago
- Agent-RRM: Exploring Reasoning Reward Model for Agents☆44Feb 4, 2026Updated 3 weeks ago
- Repository for ACL2020 paper "Refer360° A Referring Expression Recognition Dataset in 360°Images"☆13Jun 26, 2021Updated 4 years ago
- [NeurIPS 2025] Official Implementation for "Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding"☆22Dec 8, 2024Updated last year
- [ACL 2025] Can MLLMs Understand the Deep Implication Behind Chinese Images?☆20Oct 20, 2025Updated 4 months ago
- MobileLLM-R1☆75Sep 30, 2025Updated 4 months ago
- [ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM☆20May 22, 2025Updated 9 months ago
- ☆84Jan 4, 2026Updated last month
- A Comprehensive Dataset for Advanced Image Generation and Editing}☆31Oct 2, 2025Updated 4 months ago
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation☆19Feb 14, 2025Updated last year
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Oct 17, 2024Updated last year
- [NeurIPS'25] The official code of "PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning"☆30Jan 12, 2026Updated last month
- [CVPR 2026] LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling☆191Updated this week
- This project is the official implementation of 'DreamOmni3: Scribble-based Editing and Generation''☆37Dec 30, 2025Updated 2 months ago
- Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".☆23Oct 22, 2025Updated 4 months ago
- A project for tri-modal LLM benchmarking and instruction tuning.☆56Mar 27, 2025Updated 11 months ago
- Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation: A framework for generating multimodal music by bridging dif…☆28Jan 21, 2025Updated last year
- [CVPR2025] VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding☆24Mar 24, 2025Updated 11 months ago
- [AAAI 2026] SlideTailor: Personalized Presentation Slide Generation for Scientific Papers☆43Jan 1, 2026Updated last month
- paper list on Video Moment Retrieval (VMR), or Temporal Video Grounding (TVG), Video Grounding (VG), or Temporal Sentence Grounding in Vi…☆34Dec 27, 2025Updated 2 months ago
- ☆130Dec 24, 2025Updated 2 months ago
- ☆28Apr 8, 2025Updated 10 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆26Oct 17, 2024Updated last year
- ☆34Oct 9, 2025Updated 4 months ago
- [ICLR 2026] "VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?", Yuanxin Liu, Kun Ouyang, Haoning Wu, Yi Liu, L…☆37Jan 30, 2026Updated last month
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆114Dec 24, 2025Updated 2 months ago
- ☆78Jan 22, 2026Updated last month
- Complex-Edit: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark☆28Apr 22, 2025Updated 10 months ago
- Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation☆28Apr 11, 2025Updated 10 months ago
- SFT+RL boosts multimodal reasoning☆46Jun 27, 2025Updated 8 months ago