zjuruizhechen/Awesome-Video-Agent

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zjuruizhechen/Awesome-Video-Agent)

zjuruizhechen / Awesome-Video-Agent

A collection of awesome think with videos papers.

☆91

Alternatives and similar repositories for Awesome-Video-Agent

Users that are interested in Awesome-Video-Agent are comparing it to the libraries listed below

Sorting:

lcqysl / FrameThinker
View on GitHub
[ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"
☆38Oct 9, 2025Updated 5 months ago
humanlayer / genai-the-good-parts
View on GitHub
☆21Feb 13, 2025Updated last year
ByteDance-Seed / Seed-1.8
View on GitHub
☆214Dec 19, 2025Updated 2 months ago
Andrew0613 / PICABench
View on GitHub
PICABench: How Far Are We from Physically Realistic Image Editing?
☆36Nov 5, 2025Updated 4 months ago
YuejiangLIU / csl
View on GitHub
Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts
☆16Feb 26, 2024Updated 2 years ago
auniquesun / Point-Cache
View on GitHub
[CVPR 2025] Official implementation of the paper "Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Poin…
☆16Dec 24, 2025Updated 2 months ago
HKU-MMLab / Math-VR-CodePlot-CoT
View on GitHub
Math-VR Benchmark & CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
☆54Nov 4, 2025Updated 4 months ago
zjuruizhechen / PAD
View on GitHub
[ICLR 2025] Pad: Personalized alignment of llms at decoding-time
☆18Mar 19, 2025Updated 11 months ago
ZiyuGuo99 / Thinking-while-Generating
View on GitHub
The first Interleaved framework for textual reasoning within the visual generation process
☆158Updated this week
8421BCD / Agentic-R
View on GitHub
☆63Jan 26, 2026Updated last month
VisuLogic-Benchmark / VisuLogic-Train
View on GitHub
☆21Jul 9, 2025Updated 8 months ago
marco-garosi / ComCa
View on GitHub
Official implementation of the CVPR '25 highlight paper "Compositional Caching for Training-free Open-vocabulary Attribute Detection"
☆23Dec 23, 2024Updated last year
InternLM / SIM-CoT
View on GitHub
[ICLR 2026] An official implementation of "SIM-CoT: Supervised Implicit Chain-of-Thought"
☆177Feb 4, 2026Updated last month
PinxueGuo / X-Prompt
View on GitHub
☆16Oct 4, 2024Updated last year
baoxiaoyi / CoReS
View on GitHub
code for the paper "CoReS: Orchestrating the Dance of Reasoning and Segmentation"
☆22Nov 24, 2025Updated 3 months ago
thuml / MiniVeo3-Reasoner
View on GitHub
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…
☆219Oct 12, 2025Updated 4 months ago
HongbangYuan / OmniReward
View on GitHub
☆40Dec 16, 2025Updated 2 months ago
zjuruizhechen / TVG-R1
View on GitHub
[EMNLP 2025 Industry] Datasets and Recipes for Video Temporal Grounding via Reinforcement Learning
☆36Oct 22, 2025Updated 4 months ago
anpwu / ZJU-CS-ClassNotes
View on GitHub
☆21Jun 16, 2022Updated 3 years ago
PRIME-RL / RL-Compositionality
View on GitHub
FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
☆64Jan 26, 2026Updated last month
WooooDyy / BAPO
View on GitHub
Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping…
☆91Jan 29, 2026Updated last month
egolife-ai / Ego-R1
View on GitHub
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
☆142Aug 21, 2025Updated 6 months ago
EvolvingLMMs-Lab / OneVision-Encoder
View on GitHub
Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
☆279Mar 2, 2026Updated last week
Mini-o3 / Mini-o3
View on GitHub
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
☆408Jan 29, 2026Updated last month
TencentARC / Video-Holmes
View on GitHub
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆88Jul 13, 2025Updated 7 months ago
hmchuong / CoLLM
View on GitHub
[CVPR25] CoLLM: A Large Language Model for Composed Image Retrieval
☆28Mar 26, 2025Updated 11 months ago
SalesforceAIResearch / LATTE
View on GitHub
☆68Sep 15, 2025Updated 5 months ago
V-STaR-Bench / V-STaR
View on GitHub
Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
☆41Mar 2, 2026Updated last week
kyegomez / Reka-Torch
View on GitHub
Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch
☆28Feb 9, 2026Updated last month
DeepExperience / LoopTool
View on GitHub
☆59Dec 10, 2025Updated 2 months ago
Shenzhi-Wang / Beyond-the-80-20-Rule-RLVR
View on GitHub
The open-source code for the NeurIPS 2025 paper, "Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learn…
☆45Jan 5, 2026Updated 2 months ago
yangbang18 / CARE
View on GitHub
(TIP'2023) Concept-Aware Video Captioning: Describing Videos with Effective Prior Information
☆32Dec 26, 2024Updated last year
mu-cai / TemporalBench
View on GitHub
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
☆37Nov 10, 2024Updated last year
SaFo-Lab / AdaShield
View on GitHub
[ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shi…
☆72Feb 9, 2026Updated last month
destroy-lonely / MIND
View on GitHub
[ACL 2025] The official pytorch implement of "MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection".
☆25May 26, 2025Updated 9 months ago
jiasin88 / multi-step-ai-agent
View on GitHub
Multi-step AI agents powered by Gemini 2.0 and the LangGraph framework. These agents orchestrate complex workflows and enhance their reas…
☆10Dec 19, 2024Updated last year
MODSetter / next-toggle
View on GitHub
Next-Toggle is just a simple plug and use, theme toggle button with multiple light and dark themes.
☆11May 9, 2024Updated last year
Wild-Cooperation-Hub / Awesome-MLLM-Reasoning-Benchmarks
View on GitHub
A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.
☆73Mar 18, 2025Updated 11 months ago
ut-vision / ActionVOS
View on GitHub
[ECCV 2024 Oral] ActionVOS: Actions as Prompts for Video Object Segmentation
☆31Dec 4, 2024Updated last year