A collection of awesome think with videos papers.
☆91Dec 1, 2025Updated 3 months ago
Alternatives and similar repositories for Awesome-Video-Agent
Users that are interested in Awesome-Video-Agent are comparing it to the libraries listed below
Sorting:
- [ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"☆38Oct 9, 2025Updated 5 months ago
- ☆21Feb 13, 2025Updated last year
- ☆214Dec 19, 2025Updated 2 months ago
- PICABench: How Far Are We from Physically Realistic Image Editing?☆36Nov 5, 2025Updated 4 months ago
- Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts☆16Feb 26, 2024Updated 2 years ago
- [CVPR 2025] Official implementation of the paper "Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Poin…☆16Dec 24, 2025Updated 2 months ago
- Math-VR Benchmark & CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images☆54Nov 4, 2025Updated 4 months ago
- [ICLR 2025] Pad: Personalized alignment of llms at decoding-time☆18Mar 19, 2025Updated 11 months ago
- The first Interleaved framework for textual reasoning within the visual generation process☆158Updated this week
- ☆63Jan 26, 2026Updated last month
- ☆21Jul 9, 2025Updated 8 months ago
- Official implementation of the CVPR '25 highlight paper "Compositional Caching for Training-free Open-vocabulary Attribute Detection"☆23Dec 23, 2024Updated last year
- [ICLR 2026] An official implementation of "SIM-CoT: Supervised Implicit Chain-of-Thought"☆177Feb 4, 2026Updated last month
- ☆16Oct 4, 2024Updated last year
- code for the paper "CoReS: Orchestrating the Dance of Reasoning and Segmentation"☆22Nov 24, 2025Updated 3 months ago
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆219Oct 12, 2025Updated 4 months ago
- ☆40Dec 16, 2025Updated 2 months ago
- [EMNLP 2025 Industry] Datasets and Recipes for Video Temporal Grounding via Reinforcement Learning☆36Oct 22, 2025Updated 4 months ago
- ☆21Jun 16, 2022Updated 3 years ago
- FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones☆64Jan 26, 2026Updated last month
- Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping…☆91Jan 29, 2026Updated last month
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆142Aug 21, 2025Updated 6 months ago
- Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence☆279Mar 2, 2026Updated last week
- Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"☆408Jan 29, 2026Updated last month
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆88Jul 13, 2025Updated 7 months ago
- [CVPR25] CoLLM: A Large Language Model for Composed Image Retrieval☆28Mar 26, 2025Updated 11 months ago
- ☆68Sep 15, 2025Updated 5 months ago
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning☆41Mar 2, 2026Updated last week
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆28Feb 9, 2026Updated last month
- ☆59Dec 10, 2025Updated 2 months ago
- The open-source code for the NeurIPS 2025 paper, "Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learn…☆45Jan 5, 2026Updated 2 months ago
- (TIP'2023) Concept-Aware Video Captioning: Describing Videos with Effective Prior Information☆32Dec 26, 2024Updated last year
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆37Nov 10, 2024Updated last year
- [ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shi…☆72Feb 9, 2026Updated last month
- [ACL 2025] The official pytorch implement of "MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection".☆25May 26, 2025Updated 9 months ago
- Multi-step AI agents powered by Gemini 2.0 and the LangGraph framework. These agents orchestrate complex workflows and enhance their reas…☆10Dec 19, 2024Updated last year
- Next-Toggle is just a simple plug and use, theme toggle button with multiple light and dark themes.☆11May 9, 2024Updated last year
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆73Mar 18, 2025Updated 11 months ago
- [ECCV 2024 Oral] ActionVOS: Actions as Prompts for Video Object Segmentation☆31Dec 4, 2024Updated last year