ZhangXJ199/TinyLLaVA-Video

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ZhangXJ199/TinyLLaVA-Video)

ZhangXJ199 / TinyLLaVA-Video

A Simple Framework of Small-scale LMMs for Video Understanding

☆112

Alternatives and similar repositories for TinyLLaVA-Video

Users that are interested in TinyLLaVA-Video are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ZhangXJ199 / TinyLLaVA-Video-R1
View on GitHub
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
☆115Dec 24, 2025Updated 6 months ago
ZhangXJ199 / Bench-CoE
View on GitHub
A Framework for Collaboration of Experts from Benchmark
☆13Apr 27, 2025Updated last year
TinyLLaVA / TinyLLaVA_Factory
View on GitHub
A Framework of Small-scale Large Multimodal Models
☆990Updated this week
qirui-chen / MultiHop-EgoQA
View on GitHub
[AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
☆38May 27, 2025Updated last year
ruili33 / TPO
View on GitHub
☆41Sep 9, 2025Updated 10 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Vincent-ZHQ / Comprehensive-Long-Video-Understanding-Survey
View on GitHub
A survey on MM-LLMs for long video understanding: From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long…
☆23Sep 12, 2025Updated 9 months ago
DAMO-NLP-SG / VideoLLaMA3
View on GitHub
Frontier Multimodal Foundation Models for Image and Video Understanding
☆1,171Aug 14, 2025Updated 10 months ago
CASIA-IVA-Lab / VideoNIAH
View on GitHub
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆57Mar 9, 2025Updated last year
Ola-Omni / Ola
View on GitHub
Ola: Pushing the Frontiers of Omni-Modal Language Model
☆394Jun 13, 2025Updated last year
appletea233 / LLaVA-ST
View on GitHub
[CVPR 2025] LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding
☆84Jul 4, 2025Updated last year
kxfan2002 / SophiaVL-R1
View on GitHub
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
☆94Aug 8, 2025Updated 11 months ago
Theia-4869 / FasterVLM
View on GitHub
Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.
☆114Jun 29, 2025Updated last year
TencentARC / SEED-Bench-R1
View on GitHub
☆100Jun 23, 2025Updated last year
si0wang / ThinkLite-VL
View on GitHub
☆105Jun 10, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
LUMIA-Group / PonderingLM
View on GitHub
Official implementation of the paper "Pretraining Language Models to Ponder in Continuous Space"
☆26Jul 21, 2025Updated 11 months ago
thunlp / KARL
View on GitHub
KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding
☆68Apr 5, 2026Updated 3 months ago
winci-ai / CW-RGP
View on GitHub
An official implementation of CW-RGP (NeurIPS 2022, spotlight).
☆21Dec 9, 2022Updated 3 years ago
bytedance / Valley
View on GitHub
Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, video, and audio data.
☆285May 8, 2026Updated 2 months ago
OpenGVLab / VideoChat-R1
View on GitHub
[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
☆268Oct 18, 2025Updated 8 months ago
ShuheSH / FaceID-6M
View on GitHub
☆52Apr 11, 2025Updated last year
Richard-61 / FineAction
View on GitHub
The official codebase of FineAction dataset. We will update the data and code of our FineAction.
☆24Apr 10, 2025Updated last year
ModalMinds / MM-PRM
View on GitHub
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision
☆30May 26, 2025Updated last year
vision-x-nyu / thinking-in-space
View on GitHub
Official repo and evaluation implementation of VSI-Bench
☆730Aug 5, 2025Updated 11 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
lifan724 / magic_eraser
View on GitHub
☆20Jul 14, 2024Updated last year
starhiking / de_watermark_video
View on GitHub
watermark video delogo
☆11Nov 27, 2020Updated 5 years ago
tulerfeng / Video-R1
View on GitHub
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆878Dec 14, 2025Updated 6 months ago
ThisisBillhe / NAR
View on GitHub
[ICCV 2025] The official implementation of "Neighboring Autoregressive Modeling for Efficient Visual Generation"
☆62Apr 5, 2025Updated last year
yongliang-wu / Repurpose
View on GitHub
[AAAI2025] Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark
☆30Apr 4, 2026Updated 3 months ago
bytedance / tarsier
View on GitHub
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…
☆548Aug 14, 2025Updated 10 months ago
rongyaofang / PUMA
View on GitHub
Empowering Unified MLLM with Multi-granular Visual Generation
☆132Jan 16, 2025Updated last year
LINs-lab / GMem
View on GitHub
[Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models
☆43Mar 11, 2025Updated last year
jwliao-ai / MARFT
View on GitHub
☆85May 14, 2026Updated last month
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
EricTan7 / Veritas
View on GitHub
[ICLR 2026 Oral] Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning.
☆118Feb 27, 2026Updated 4 months ago
TideDra / lmm-r1
View on GitHub
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
☆847May 14, 2025Updated last year
Oryx-mllm / Oryx
View on GitHub
[ICLR 2025] MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
☆329Jul 4, 2025Updated last year
ByungKwanLee / TroL
View on GitHub
[EMNLP 2024] Official PyTorch implementation code for realizing the technical part of Traversal of Layers (TroL) presenting new propagati…
☆99Jun 23, 2024Updated 2 years ago
wzhuang-xmu / LoSA
View on GitHub
[ICLR 2025] Official implementation of paper "Dynamic Low-Rank Sparse Adaptation for Large Language Models".
☆25Mar 16, 2025Updated last year
chengzu-li / MVoT
View on GitHub
Imagine While Reasoning in Space: Multimodal Visualization-of-Thought (ICML 2025)
☆77Apr 12, 2025Updated last year
SempraETY / Pruning-via-Merging
View on GitHub
☆23Nov 26, 2024Updated last year