QQ-MM/Video-CCAM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/QQ-MM/Video-CCAM)

QQ-MM / Video-CCAM

A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.

☆74

Alternatives and similar repositories for Video-CCAM

Users that are interested in Video-CCAM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

KangarooGroup / Kangaroo
View on GitHub
official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input
☆68Aug 30, 2024Updated last year
yellow-binary-tree / HawkEye
View on GitHub
Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos
☆47Apr 29, 2024Updated 2 years ago
TencentARC / ST-LLM
View on GitHub
[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
☆155Sep 10, 2024Updated last year
Hon-Wong / ByteVideoLLM
View on GitHub
[ICCV 2025] Dynamic-VLM
☆28Dec 16, 2024Updated last year
Share14 / ShareGemini
View on GitHub
☆32Jul 29, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
TimeMarker-LLM / TimeMarker
View on GitHub
A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
☆107Nov 28, 2024Updated last year
llyx97 / TempCompass
View on GitHub
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆131Apr 4, 2025Updated last year
EvolvingLMMs-Lab / LongVA
View on GitHub
Long Context Transfer from Language to Vision
☆405Mar 18, 2025Updated last year
DCDmllm / Momentor
View on GitHub
☆81Nov 24, 2024Updated last year
ttengwang / Awesome_Long_Form_Video_Understanding
View on GitHub
Awesome papers & datasets specifically focused on long-term videos.
☆378Oct 9, 2025Updated 8 months ago
RifleZhang / LLaVA-Hound-DPO
View on GitHub
☆157Oct 31, 2024Updated last year
Richard-61 / FineAction
View on GitHub
The official codebase of FineAction dataset. We will update the data and code of our FineAction.
☆24Apr 10, 2025Updated last year
rxtan2 / Koala-video-llm
View on GitHub
☆37Sep 16, 2024Updated last year
RUCAIBox / Event-Bench
View on GitHub
Official code of *Towards Event-oriented Long Video Understanding*
☆12Jul 26, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
gyxxyg / VTG-LLM
View on GitHub
[AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
☆128Dec 10, 2024Updated last year
IVGSZ / Flash-VStream
View on GitHub
This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"
☆281Oct 15, 2025Updated 7 months ago
bytedance / tarsier
View on GitHub
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…
☆545Aug 14, 2025Updated 9 months ago
magic-research / PLLaVA
View on GitHub
Official repository for the paper PLLaVA
☆670Jul 28, 2024Updated last year
FreedomIntelligence / LongLLaVA
View on GitHub
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
☆211Jan 6, 2025Updated last year
doc-doc / NExT-GQA
View on GitHub
Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
☆89Jul 1, 2024Updated last year
xmu-xiaoma666 / Multimodal-Open-O1
View on GitHub
Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…
☆28Sep 25, 2024Updated last year
MME-Benchmarks / Video-MME
View on GitHub
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
☆779Dec 8, 2025Updated 6 months ago
RenShuhuai-Andy / TimeChat
View on GitHub
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
☆421May 8, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
egoschema / EgoSchema
View on GitHub
☆116Dec 30, 2024Updated last year
Yaxin9Luo / Gamma-MOD
View on GitHub
[ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models
☆43Oct 28, 2025Updated 7 months ago
JIA-Lab-research / LLaMA-VID
View on GitHub
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
☆861Jul 29, 2024Updated last year
Vision-CAIR / Infinibench
View on GitHub
Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows
☆20Nov 4, 2025Updated 7 months ago
SCZwangxiao / video-ReTaKe
View on GitHub
Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
☆40Mar 16, 2025Updated last year
bytedance / Shot2Story
View on GitHub
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
☆175Jan 30, 2025Updated last year
hshjerry / VideoEspresso
View on GitHub
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆141Jul 28, 2025Updated 10 months ago
OmniMMI / M4
View on GitHub
[CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
☆18Apr 2, 2025Updated last year
QQ-MM / PureMM
View on GitHub
☆21Feb 29, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
RUC-NLPIR / VideoDeepResearch
View on GitHub
☆149Nov 17, 2025Updated 6 months ago
RupertLuo / Valley
View on GitHub
The official repository of "Video assistant towards large language model makes everything easy"
☆230Dec 24, 2024Updated last year
md-mohaiminul / BIMBA
View on GitHub
☆29Jul 25, 2025Updated 10 months ago
orrzohar / Video-STaR
View on GitHub
[ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
☆72Jul 10, 2024Updated last year
JUNJIE99 / MLVU
View on GitHub
🔥🔥MLVU: Multi-task Long Video Understanding Benchmark
☆261Apr 13, 2026Updated last month
CASIA-IVA-Lab / OPT_Questioner
View on GitHub
Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"
☆15Aug 9, 2023Updated 2 years ago
DAMO-NLP-SG / VideoLLaMA2
View on GitHub
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
☆1,299Jan 23, 2025Updated last year