TencentARC/ARC-Hunyuan-Video-7B

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/TencentARC/ARC-Hunyuan-Video-7B)

TencentARC / ARC-Hunyuan-Video-7B

Structured Video Comprehension of Real-World Shorts

☆231

Alternatives and similar repositories for ARC-Hunyuan-Video-7B

Users that are interested in ARC-Hunyuan-Video-7B are comparing it to the libraries listed below

Sorting:

TencentARC / Video-Holmes
View on GitHub
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆88Jul 13, 2025Updated 7 months ago
TencentARC / TimeLens
View on GitHub
[CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
☆105Feb 26, 2026Updated last week
TencentARC / ARC-Chapter
View on GitHub
Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
☆34Nov 19, 2025Updated 3 months ago
TencentARC / Divot
View on GitHub
Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)
☆86Feb 27, 2025Updated last year
bytedance / F-16
View on GitHub
F-16 is a powerful video large language model (LLM) that perceives high-frame-rate videos, which is developed by the Department of Electr…
☆34Jul 3, 2025Updated 8 months ago
marinero4972 / Open-o3-Video
View on GitHub
Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆133Dec 18, 2025Updated 2 months ago
nusnlp / d2vlm
View on GitHub
[ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models
☆24Jan 1, 2026Updated 2 months ago
DwanZhang-AI / SePPO
View on GitHub
Code for "SePPO: Semi-Policy Preference Optimization for Diffusion Alignment."
☆18Oct 7, 2024Updated last year
qiulu66 / EgoPlan-Bench2
View on GitHub
☆27Apr 11, 2025Updated 10 months ago
KD-TAO / OmniZip
View on GitHub
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
☆56Feb 1, 2026Updated last month
DCDmllm / Momentor
View on GitHub
☆80Nov 24, 2024Updated last year
VidCapBench / VidCapBench
View on GitHub
☆13May 17, 2025Updated 9 months ago
Kwai-Keye / Keye
View on GitHub
☆718Feb 5, 2026Updated last month
yellow-binary-tree / HawkEye
View on GitHub
Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos
☆46Apr 29, 2024Updated last year
xiaomi-research / time-r1
View on GitHub
[NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding
☆79Dec 14, 2025Updated 2 months ago
TencentARC / SEED-Bench-R1
View on GitHub
☆98Jun 23, 2025Updated 8 months ago
wdrink / SimpleAR
View on GitHub
Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"
☆425Jun 20, 2025Updated 8 months ago
appletea233 / Temporal-R1
View on GitHub
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
☆60Jun 6, 2025Updated 8 months ago
TencentARC / TokLIP
View on GitHub
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
☆235Aug 18, 2025Updated 6 months ago
zjucsq / PLA
View on GitHub
[ICLR2023] Video Scene Graph Generation from Single-Frame Weak Supervision
☆12Sep 17, 2023Updated 2 years ago
tulerfeng / Video-R1
View on GitHub
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆831Dec 14, 2025Updated 2 months ago
jiyt17 / IDA-VLM
View on GitHub
[ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
☆37Nov 27, 2024Updated last year
zhouyiks / CoLVA
View on GitHub
☆43Jul 9, 2025Updated 7 months ago
X-Omni-Team / X-Omni
View on GitHub
Official inference code and LongText-Bench benchmark for our paper X-Omni (https://arxiv.org/pdf/2507.22058).
☆420Aug 26, 2025Updated 6 months ago
ShoufaChen / PixelFlow
View on GitHub
Pixel-Space Generative Models
☆303May 11, 2025Updated 9 months ago
ZGCTroy / RealCam-Vid
View on GitHub
open-sourced video dataset with dynamic scenes and camera movements annotation
☆86Apr 24, 2025Updated 10 months ago
CUC-MIPG / UniVid
View on GitHub
Official code of "UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models" WACV2026
☆36Nov 24, 2025Updated 3 months ago
THUNLP-MT / MUSEG
View on GitHub
Repo for paper "MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding".
☆39Jun 9, 2025Updated 8 months ago
Jayce1kk / SpaceVLLM
View on GitHub
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability
☆16May 8, 2025Updated 9 months ago
cambrian-mllm / cambrian-s
View on GitHub
Cambrian-S: Towards Spatial Supersensing in Video
☆500Dec 27, 2025Updated 2 months ago
RobertLuo1 / iccv2023_RVOS_Challenge
View on GitHub
[ICCV 2023 Workshop] The Official Implementation of The First Prize Solution for RVOS Competition
☆14Jan 1, 2024Updated 2 years ago
Andrew0613 / PICABench
View on GitHub
PICABench: How Far Are We from Physically Realistic Image Editing?
☆36Nov 5, 2025Updated 4 months ago
JoeLeelyf / Skyra
View on GitHub
[CVPR2026] Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning
☆38Dec 22, 2025Updated 2 months ago
TencentARC / Moto
View on GitHub
[ICCV2025 Oral] Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
☆164Oct 1, 2025Updated 5 months ago
Gen-Verse / HermesFlow
View on GitHub
[NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
☆76Sep 19, 2025Updated 5 months ago
Lzq5 / UniTime
View on GitHub
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
☆46Nov 25, 2025Updated 3 months ago
yunlong10 / CAT-V
View on GitHub
[AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal P…
☆64Jan 27, 2026Updated last month
NVlabs / Long-RL
View on GitHub
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆693Sep 24, 2025Updated 5 months ago
chenllliang / DreamEngine
View on GitHub
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think!
☆121Mar 4, 2025Updated last year