yangruoliu/VideoDetective

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yangruoliu/VideoDetective)

yangruoliu / VideoDetective

VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding

☆58

Alternatives and similar repositories for VideoDetective

Users that are interested in VideoDetective are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MiG-NJU / PersonaVLM
View on GitHub
[CVPR 2026 Highlight] PersonaVLM: Long-Term Personalized Multimodal LLMs
☆112Apr 16, 2026Updated 3 months ago
VITA-MLLM / Omni-Diffusion
View on GitHub
✨✨[ICML 2026] Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion
☆153Mar 12, 2026Updated 4 months ago
VITA-MLLM / VITA-QinYu
View on GitHub
VITA-QINYU: Expressive Spoken Language Model for Role-Playing and Singing
☆121Jul 14, 2026Updated last week
Tencent / VITA
View on GitHub
The official implement of VITA, VITA15, LongVITA, VITA-Audio, VITA-VLA, and VITA-E.
☆162Oct 28, 2025Updated 8 months ago
VITA-MLLM / Sparrow
View on GitHub
Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation
☆32Mar 28, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
MME-Benchmarks / Video-MME-v2
View on GitHub
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
☆369May 24, 2026Updated 2 months ago
MME-Benchmarks / MME-Unify
View on GitHub
✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
☆42Apr 10, 2025Updated last year
VITA-MLLM / Long-VITA
View on GitHub
✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy
☆305May 14, 2025Updated last year
pwang322 / Efficient-Multimodal-Learning-Survey
View on GitHub
The official GitHub page for the survey paper "From Models to Systems: A Comprehensive Survey of Efficient Multimodal Learning". And this…
☆20Jun 8, 2026Updated last month
Jialuo-Li / DIG
View on GitHub
[CVPR 2026] Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
☆21Feb 21, 2026Updated 5 months ago
zsgvivo / VideoZoomer
View on GitHub
☆34Feb 12, 2026Updated 5 months ago
MAC-AutoML / QuoTA
View on GitHub
✨✨[AAAI 2026] This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Vi…
☆79Apr 28, 2025Updated last year
Kwai-YuanQi / MM-RLHF
View on GitHub
The Next Step Forward in Multimodal LLM Alignment
☆198May 1, 2025Updated last year
aim-uofa / ConvNova
View on GitHub
☆13Apr 23, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
lcqysl / FrameThinker
View on GitHub
[ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"
☆50Oct 9, 2025Updated 9 months ago
64327069 / LVAgent
View on GitHub
Code of LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
☆39Nov 24, 2025Updated 8 months ago
yfzhang114 / r1_reward
View on GitHub
✨✨ [ICLR 2026] R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
☆291May 9, 2025Updated last year
yfzhang114 / Thyme
View on GitHub
✨✨ [ICLR 2026] Think Beyond Images
☆583Sep 23, 2025Updated 10 months ago
wangruohui / EfficientVideoAgent
View on GitHub
EVA: Efficient Reinforcement Learning for End-to-End Video Agent
☆26May 6, 2026Updated 2 months ago
SalesforceAIResearch / ActiveVideoPerception
View on GitHub
Official Code for paper "Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding""
☆18Jun 2, 2026Updated last month
aiha-lab / InfiniPot-V
View on GitHub
[NeurIPS 25] InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
☆20Jan 25, 2026Updated 5 months ago
lezhang7 / MOQAGPT
View on GitHub
[EMNLP'2023 Findings] MoqaGPT, for zero-shot multimodal question answering with LLMs
☆13Dec 28, 2024Updated last year
JPShi12 / VideoLoom
View on GitHub
[ICML 2026] VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding
☆27Jul 3, 2026Updated 3 weeks ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
CARE-Edit / Code
View on GitHub
[CVPR 2026] A unified editor with four heterogeneous experts via condition-aware router. This repo is the official code for "CARE-Edit: C…
☆33Jun 15, 2026Updated last month
sail-sg / Video-Next-Event-Prediction
View on GitHub
☆28Aug 9, 2025Updated 11 months ago
MILVLG / videoarm
View on GitHub
☆27Apr 9, 2026Updated 3 months ago
NVlabs / VideoITG
View on GitHub
[CVPR 2026 Highlight] VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding
☆126Apr 17, 2026Updated 3 months ago
Hoar012 / TDC-Video
View on GitHub
Official implementation of TDC.
☆15Jul 22, 2025Updated last year
Fr0zenCrane / Uni-ViGU
View on GitHub
Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator
☆33Apr 15, 2026Updated 3 months ago
PRIME-RL / P1-VL
View on GitHub
P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads
☆15Feb 11, 2026Updated 5 months ago
Leon1207 / Video-RAG-master
View on GitHub
✨✨[NeurIPS 2025] This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehensi…
☆446Jun 26, 2026Updated 3 weeks ago
ncTimTang / AKS
View on GitHub
[CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding
☆228Dec 19, 2025Updated 7 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Ziyang412 / Video-RTS
View on GitHub
Code for EMNLP25 paper "Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning"
☆24Feb 18, 2026Updated 5 months ago
nusnlp / d2vlm
View on GitHub
[ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models
☆24Apr 18, 2026Updated 3 months ago
MAC-AutoML / WFS-SB
View on GitHub
[CVPR 2026] Wavelet-based Frame Selection by Detecting Semantic Boundary for Long Video Understanding
☆32Apr 12, 2026Updated 3 months ago
qirui-chen / MultiHop-EgoQA
View on GitHub
[AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
☆38May 27, 2025Updated last year
qiujihao19 / LongVideo-R1
View on GitHub
[CVPR 2026] LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding
☆50Jul 7, 2026Updated 2 weeks ago
egolife-ai / Ego-R1
View on GitHub
[TPAMI 2026] Ego-R1: Agentic Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
☆165Jun 10, 2026Updated last month
wangf3014 / Patch_Scaling
View on GitHub
Official implementation of Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More
☆25Feb 25, 2025Updated last year