Vincent-ZHQ/Comprehensive-Long-Video-Understanding-Survey

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Vincent-ZHQ/Comprehensive-Long-Video-Understanding-Survey)

Vincent-ZHQ / Comprehensive-Long-Video-Understanding-Survey

A survey on MM-LLMs for long video understanding: From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding

☆23

Alternatives and similar repositories for Comprehensive-Long-Video-Understanding-Survey

Users that are interested in Comprehensive-Long-Video-Understanding-Survey are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

qirui-chen / MultiHop-EgoQA
View on GitHub
[AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
☆38May 27, 2025Updated last year
zchoi / SPT
View on GitHub
[TCSVT23] Official code for "SPT: Spatial Pyramid Transformer for Image Captioning".
☆10Aug 14, 2024Updated last year
mightyzau / InfMLLM
View on GitHub
☆19Dec 6, 2023Updated 2 years ago
dddraxxx / Ref-Adv
View on GitHub
[ICLR 2026] Official code for "Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks"
☆26Mar 2, 2026Updated 4 months ago
EchoSafe-MLLM / EchoSafe
View on GitHub
[CVPR 2026] Code for Evolving Contextual Safety in Multi-Modal Large Language Models via Inference-Time Self-Reflective Memory
☆15Mar 18, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
mlvlab / DeepVideoR1
View on GitHub
[NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"
☆36Feb 22, 2026Updated 5 months ago
mshukor / eP-ALM
View on GitHub
[ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.
☆27Oct 27, 2023Updated 2 years ago
ChongjianGE / SNCLR
View on GitHub
[ICLR 2023] Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning
☆15Aug 2, 2023Updated 2 years ago
opendatalab / image-downloader
View on GitHub
☆31May 13, 2024Updated 2 years ago
steven-ccq / ViLAMP
View on GitHub
[ICML 2025] Official repository for paper "Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation"
☆194Sep 23, 2025Updated 10 months ago
OpenGVLab / VKnowU
View on GitHub
[ECCV 2026] VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs
☆16Feb 3, 2026Updated 5 months ago
Sen-Yao / RCS-Calculator
View on GitHub
A python script to calculate radar cross section.
☆12Dec 26, 2023Updated 2 years ago
Jianglin954 / LGI-LS
View on GitHub
[NeurIPS 2023] Latent Graph Inference with Limited Supervision
☆33Feb 1, 2024Updated 2 years ago
Jianglin954 / QCQC
View on GitHub
[ICLR 2026] Seeing Through Words: Controlling Visual Retrieval Quality with Language Models
☆28Mar 19, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
reachpranjal / lego-drive
View on GitHub
[Official] [IROS 2024] A goal-oriented planning to lift VLN performance for Closed-Loop Navigation: Simple, Yet Effective
☆28Apr 4, 2024Updated 2 years ago
OpenGVLab / PVC
View on GitHub
[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
☆54Jun 12, 2025Updated last year
Mr-Neko / JM3D
View on GitHub
The offical implemention of JM3D.
☆31Apr 8, 2026Updated 3 months ago
elianastasio / MiniGPT-Pancreas
View on GitHub
MiniGPT-Pancreas: Multimodal Large language Model for Pancreas Cancer Classification and Detection
☆12Sep 19, 2025Updated 10 months ago
hukcc / Awesome-Video-Hallucination
View on GitHub
[ACL 2026] Paper list of Video LLM hallucination. Welcome to Star and Contribute!
☆36Jul 1, 2026Updated 3 weeks ago
frycast / studentlife
View on GitHub
Tidy handling and navigation of the valuable Student-Life mHealth dataset
☆24Apr 22, 2021Updated 5 years ago
gqk / RelayGS
View on GitHub
RelayGS: Reconstructing Dynamic Scenes with Large-Scale and Complex Motions via Relay Gaussians
☆14Dec 5, 2024Updated last year
hy0Y / ST-GT
View on GitHub
[CVPR 2024] Official repository of ST_GT
☆10Sep 15, 2024Updated last year
Manu21JC / DataElixir
View on GitHub
[AAAI 2024] DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via Diffusion Models
☆12Dec 5, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
VividLe / ExtractVideoFeature
View on GitHub
Extract video features. Currently, the models includes I3D, will be continuously updated.
☆12Jun 4, 2020Updated 6 years ago
chenxinli001 / StegaNeRF
View on GitHub
Official Pytorch implementation of "StegaNeRF: Embedding Invisible Information within Neueral Radiance Fields", ICCV2023
☆47Nov 23, 2024Updated last year
XTCHDU / anti_jamming
View on GitHub
☆12Jan 12, 2019Updated 7 years ago
lwq20020127 / OmniDrag
View on GitHub
[IJCV 2025] OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation
☆16Feb 13, 2026Updated 5 months ago
alibaba-mmai-research / HiCo
View on GitHub
CVPR2022:Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency
☆18Aug 10, 2022Updated 3 years ago
rlqja1107 / NL-VSGG
View on GitHub
Official PyTorch implementation Source code for Weakly Supervised Video Scene Graph Generation via Natural Language Supervision, accepted…
☆25Jun 13, 2025Updated last year
FeiElysia / Tempo
View on GitHub
Tempo: Small Vision-Language Models are Smart Compressors for Long Video Understanding, ECCV 2026
☆77Updated this week
ruizhao26 / BSF
View on GitHub
Official codes of Boosting Spike Camera Image Reconstruction from a Perspective of Dealing with Spike Fluctuations- CVPR 2024
☆10Jul 31, 2024Updated last year
CoinCheung / MFM
View on GitHub
code for paper "Masked Frequency Modeling for Self-Supervised Visual Pre-Training" (https://arxiv.org/pdf/2206.07706.pdf)
☆24Feb 3, 2023Updated 3 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
NickGeramanis / rl-uav
View on GitHub
Undergraduate Thesis.
☆11Apr 13, 2025Updated last year
PVIT-official / PVIT
View on GitHub
Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
☆37Sep 19, 2023Updated 2 years ago
mrezaakbari / DeepFake
View on GitHub
real time face swap and one-click video deepfake with only a single image
☆13Sep 10, 2024Updated last year
Lihy256 / MSCDUnet
View on GitHub
☆30Nov 2, 2023Updated 2 years ago
xuanyuzhang21 / CRoSS
View on GitHub
[NeurIPS 2023] Official PyTorch implementation for the paper "CRoSS: Diffusion Model Makes Controllable, Robust and Secure Image Steganog…
☆11Sep 28, 2023Updated 2 years ago
NY1024 / ClawGuard
View on GitHub
ClawGuard is a comprehensive security toolkit designed to mitigate risks associated with autonomous agents, such as OpenClaw and other LL…
☆25Apr 25, 2026Updated 3 months ago
SCZwangxiao / video-ReTaKe
View on GitHub
Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
☆40Mar 16, 2025Updated last year