zjuruizhechen / Awesome-Video-AgentView external linksLinks
A collection of awesome think with videos papers.
☆89Dec 1, 2025Updated 2 months ago
Alternatives and similar repositories for Awesome-Video-Agent
Users that are interested in Awesome-Video-Agent are comparing it to the libraries listed below
Sorting:
- [ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"☆37Oct 9, 2025Updated 4 months ago
- ☆20Feb 13, 2025Updated last year
- ☆213Dec 19, 2025Updated last month
- [CVPR 2025] Official implementation of the paper "Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Poin…☆16Dec 24, 2025Updated last month
- Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts☆16Feb 26, 2024Updated last year
- PICABench: How Far Are We from Physically Realistic Image Editing?☆35Nov 5, 2025Updated 3 months ago
- [ICLR 2025] Pad: Personalized alignment of llms at decoding-time☆18Mar 19, 2025Updated 10 months ago
- Math-VR Benchmark & CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images☆52Nov 4, 2025Updated 3 months ago
- The first Interleaved framework for textual reasoning within the visual generation process☆157Nov 21, 2025Updated 2 months ago
- ☆59Jan 26, 2026Updated 3 weeks ago
- [ICLR 2026] An official implementation of "SIM-CoT: Supervised Implicit Chain-of-Thought"☆165Feb 4, 2026Updated last week
- Official implementation of the CVPR '25 highlight paper "Compositional Caching for Training-free Open-vocabulary Attribute Detection"☆23Dec 23, 2024Updated last year
- ☆16Oct 4, 2024Updated last year
- code for the paper "CoReS: Orchestrating the Dance of Reasoning and Segmentation"☆21Nov 24, 2025Updated 2 months ago
- ☆21Jul 9, 2025Updated 7 months ago
- [AAAI'25]: Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP☆19Aug 5, 2025Updated 6 months ago
- ☆40Dec 16, 2025Updated 2 months ago
- [EMNLP 2025 Industry] Datasets and Recipes for Video Temporal Grounding via Reinforcement Learning☆35Oct 22, 2025Updated 3 months ago
- [LLaVA-Video-R1]✨First Adaptation of R1 to LLaVA-Video (2025-03-18)☆68May 9, 2025Updated 9 months ago
- Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping…☆91Jan 29, 2026Updated 2 weeks ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆140Aug 21, 2025Updated 5 months ago
- Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"☆403Jan 29, 2026Updated 2 weeks ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆86Jul 13, 2025Updated 7 months ago
- ☆68Sep 15, 2025Updated 5 months ago
- ☆57Dec 10, 2025Updated 2 months ago
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning☆41Aug 4, 2025Updated 6 months ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆28Feb 9, 2026Updated last week
- This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs☆36Mar 9, 2025Updated 11 months ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆37Nov 10, 2024Updated last year
- A Massive Multi-Discipline Lecture Understanding Benchmark☆32Nov 1, 2025Updated 3 months ago
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆71Mar 18, 2025Updated 11 months ago
- Next-Toggle is just a simple plug and use, theme toggle button with multiple light and dark themes.☆11May 9, 2024Updated last year
- Multi-step AI agents powered by Gemini 2.0 and the LangGraph framework. These agents orchestrate complex workflows and enhance their reas…☆10Dec 19, 2024Updated last year
- [ACL 2025] The official pytorch implement of "MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection".☆26May 26, 2025Updated 8 months ago
- Official Repository of Native Parallel Reasoner☆100Feb 5, 2026Updated last week
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆38Jan 26, 2026Updated 3 weeks ago
- [CVPR 2025 Highlight] Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding☆60Aug 31, 2025Updated 5 months ago
- [ECCV 2024 Oral] ActionVOS: Actions as Prompts for Video Object Segmentation☆31Dec 4, 2024Updated last year
- Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence☆199Updated this week