Hon-Wong/ByteVideoLLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Hon-Wong/ByteVideoLLM)

Hon-Wong / ByteVideoLLM

[ICCV 2025] Dynamic-VLM

☆28

Alternatives and similar repositories for ByteVideoLLM

Users that are interested in ByteVideoLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

paulgavrikov / visualoverload
View on GitHub
VisualOverload (CVPR 2026) is a VQA benchmark for image understanding in dense, high-resolution scenes.
☆18May 31, 2026Updated last month
SCZwangxiao / video-ReTaKe
View on GitHub
Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
☆40Mar 16, 2025Updated last year
TimeMarker-LLM / TimeMarker
View on GitHub
A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
☆107Nov 28, 2024Updated last year
hshjerry / VideoEspresso
View on GitHub
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆140Jul 28, 2025Updated 11 months ago
Vision-CAIR / Infinibench
View on GitHub
Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows
☆20Nov 4, 2025Updated 8 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Tencent-QQMM / Video-CCAM
View on GitHub
A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.
☆73Oct 14, 2024Updated last year
QiuHeqian / mmdetection-ref
View on GitHub
☆10Jan 9, 2025Updated last year
Dynamics-X / Thinking-in-Dynamics
View on GitHub
[CVPR 2026]"Thinking in Dynamics: How Multimodal Large Language Models Perceive, Track, and Reason Dynamics in Physical 4D World"
☆17Jul 7, 2026Updated 2 weeks ago
Anonymous1252022 / Megatron-DeepSpeed
View on GitHub
☆18Sep 22, 2024Updated last year
letitiabanana / PnP-OVSS
View on GitHub
[CVPR'24] Code for Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models
☆18Jul 22, 2024Updated last year
mbzuai-oryx / Video-R2
View on GitHub
Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models
☆19Jan 21, 2026Updated 6 months ago
FreedomIntelligence / LongLLaVA
View on GitHub
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
☆211Jan 6, 2025Updated last year
HYUNJS / STOV-TAL
View on GitHub
[WACV-2025] Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization
☆17May 28, 2025Updated last year
yellow-binary-tree / HawkEye
View on GitHub
Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos
☆47Apr 29, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
yingsen1 / UniMD
View on GitHub
UniMD: Towards Unifying Moment retrieval and temporal action Detection
☆57Jul 5, 2024Updated 2 years ago
longvideobench / LongVideoBench
View on GitHub
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆133Jul 27, 2024Updated last year
WangWenhao0716 / PDF-Embedding
View on GitHub
[NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"
☆18Oct 1, 2024Updated last year
RenShuhuai-Andy / NBP
View on GitHub
Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
☆42Feb 12, 2025Updated last year
OpenGVLab / PVC
View on GitHub
[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
☆54Jun 12, 2025Updated last year
alanaai / EVUD
View on GitHub
Egocentric Video Understanding Dataset (EVUD)
☆34Jul 4, 2024Updated 2 years ago
Andy-Cheng / TEMPURA
View on GitHub
TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…
☆27Jun 4, 2025Updated last year
SUSTechBruce / LOOK-M
View on GitHub
[EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…
☆103Nov 9, 2024Updated last year
Share14 / ShareGemini
View on GitHub
☆32Jul 29, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
xjtupanda / Sparrow
View on GitHub
Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"
☆48Sep 3, 2025Updated 10 months ago
ruili33 / TPO
View on GitHub
☆41Sep 9, 2025Updated 10 months ago
longrongyang / STGC
View on GitHub
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
☆13Feb 11, 2025Updated last year
yu2hi13 / P2SAM
View on GitHub
Official implementation for P2SAM (ACM MM 2024)
☆14Dec 7, 2024Updated last year
yeliudev / R2-Tuning
View on GitHub
🌀 R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)
☆92Jul 2, 2024Updated 2 years ago
appletea233 / LLaVA-ST
View on GitHub
[CVPR 2025] LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding
☆84Jul 4, 2025Updated last year
rxtan2 / Koala-video-llm
View on GitHub
☆37Sep 16, 2024Updated last year
slonetime / EBSeg
View on GitHub
[CVPR2024] Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
☆41Jan 12, 2026Updated 6 months ago
RUCBM / ICLEval
View on GitHub
☆14Jun 24, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
mu-cai / TemporalBench
View on GitHub
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
☆40Nov 10, 2024Updated last year
Sanster / padding_free_llm_train
View on GitHub
☆16Feb 6, 2024Updated 2 years ago
SHI-Labs / Slow-Fast-Video-Multimodal-LLM
View on GitHub
☆29Apr 8, 2025Updated last year
jylins / hourllava
View on GitHub
[NeurIPS 2025 Spotlight] Unleashing Hour-Scale Video Training for Long Video-Language Understanding
☆19Jun 24, 2025Updated last year
PolyU-ChenLab / ETBench
View on GitHub
👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)
☆74Jan 20, 2025Updated last year
gyxxyg / TRACE
View on GitHub
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
☆156Aug 22, 2025Updated 10 months ago
yuecao0119 / MMFuser
View on GitHub
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …
☆63Nov 5, 2024Updated last year