wxh1996/VideoAgent

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wxh1996/VideoAgent)

wxh1996 / VideoAgent

☆150

Alternatives and similar repositories for VideoAgent

Users that are interested in VideoAgent are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

YueFan1014 / VideoAgent
View on GitHub
This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)
☆320Dec 5, 2024Updated last year
Ziyang412 / VideoTree
View on GitHub
Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
☆165Jun 23, 2025Updated last year
ruili33 / TPO
View on GitHub
☆41Sep 9, 2025Updated 10 months ago
fansunqi / AKeyS
View on GitHub
Agentic Keyframe Search for Video Question Answering
☆18Jun 30, 2026Updated 3 weeks ago
egoschema / EgoSchema
View on GitHub
☆117Dec 30, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ziplab / LongVLM
View on GitHub
☆108Jul 30, 2024Updated last year
PanasonicConnect / VideoMultiAgents
View on GitHub
VideoMultiAgents: A Multi-Agent Framework for Video Question Answering
☆39May 7, 2025Updated last year
agentic-learning-ai-lab / lifelong-memory
View on GitHub
Code for LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos
☆33Oct 27, 2025Updated 8 months ago
jongwoopark7978 / LVNet
View on GitHub
[Main Conference @ EACL'26] [Workshop @ NeurIPS'24] 🎞️ LVNet.
☆44Feb 10, 2026Updated 5 months ago
wenhaochai / MovieChat
View on GitHub
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
☆704Jan 29, 2025Updated last year
kkahatapitiya / LangRepo
View on GitHub
Code for our ACL 2025 paper "Language Repository for Long Video Understanding"
☆36Jun 17, 2024Updated 2 years ago
doc-doc / NExT-QA
View on GitHub
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
☆189Aug 2, 2025Updated 11 months ago
microsoft / DeepVideoDiscovery
View on GitHub
**Deep Video Discovery (DVD)** is a deep-research style question answering agent designed for understanding extra-long videos.
☆403Nov 3, 2025Updated 8 months ago
traveler-framework / TraveLER
View on GitHub
[EMNLP 2024] TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering
☆18Oct 31, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
CeeZh / SILVR
View on GitHub
Official Implementation for "SiLVR : A Simple Language-based Video Reasoning Framework"
☆19Jan 18, 2026Updated 6 months ago
huangb23 / VTimeLLM
View on GitHub
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
☆295Jun 13, 2024Updated 2 years ago
boheumd / MA-LMM
View on GitHub
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
☆350Jul 19, 2024Updated 2 years ago
ttengwang / Awesome_Long_Form_Video_Understanding
View on GitHub
Awesome papers & datasets specifically focused on long-term videos.
☆381Oct 9, 2025Updated 9 months ago
CeeZh / LLoVi
View on GitHub
Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"
☆106Oct 27, 2024Updated last year
doc-doc / NExT-OE
View on GitHub
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
☆30Jul 18, 2023Updated 3 years ago
z-x-yang / DoraemonGPT
View on GitHub
Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models
☆91Jun 19, 2026Updated last month
orrzohar / Video-STaR
View on GitHub
[ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
☆72Jul 10, 2024Updated 2 years ago
longvideobench / LongVideoBench
View on GitHub
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆133Jul 27, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Tencent-QQMM / Video-CCAM
View on GitHub
A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.
☆74Oct 14, 2024Updated last year
bigai-nlco / VideoLLaMB
View on GitHub
[ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges
☆87Feb 27, 2025Updated last year
yunlong10 / Awesome-LLMs-for-Video-Understanding
View on GitHub
🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.
☆3,246Jun 13, 2026Updated last month
mbzuai-oryx / Video-ChatGPT
View on GitHub
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the cap…
☆1,503Aug 5, 2025Updated 11 months ago
MGitHubL / TMac
View on GitHub
☆14Feb 26, 2024Updated 2 years ago
WHB139426 / GCG
View on GitHub
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]
☆10Jul 22, 2024Updated 2 years ago
MAC-AutoML / WFS-SB
View on GitHub
[CVPR 2026] Wavelet-based Frame Selection by Detecting Semantic Boundary for Long Video Understanding
☆31Apr 12, 2026Updated 3 months ago
Leon1207 / Video-RAG-master
View on GitHub
✨✨[NeurIPS 2025] This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehensi…
☆446Jun 26, 2026Updated 3 weeks ago
bigai-nlco / VideoTGB
View on GitHub
[EMNLP 2024] A Video Chat Agent with Temporal Prior
☆33Mar 2, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
gyxxyg / VTG-LLM
View on GitHub
[AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
☆130Dec 10, 2024Updated last year
scofield7419 / Video-of-Thought
View on GitHub
Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
☆182Feb 25, 2025Updated last year
RUC-NLPIR / VideoDeepResearch
View on GitHub
☆155Nov 17, 2025Updated 8 months ago
tulerfeng / Video-R1
View on GitHub
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆881Dec 14, 2025Updated 7 months ago
doc-doc / NExT-GQA
View on GitHub
Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
☆89Jul 1, 2024Updated 2 years ago
llyx97 / TempCompass
View on GitHub
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆133Apr 4, 2025Updated last year
wenhaochai / aurora
View on GitHub
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
☆147Jun 4, 2025Updated last year