ictnlp / LLaVA-MiniLinks

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

☆537

Alternatives and similar repositories for LLaVA-Mini

Users that are interested in LLaVA-Mini are comparing it to the libraries listed below

Sorting:

PKU-YuanGroup / Chat-UniVi
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
☆940Updated last year
UX-Decoder / LLaVA-Grounding
☆420Updated last year
Vision-CAIR / LongVU
[ICML 2025] Official PyTorch implementation of LongVU
☆407Updated 5 months ago
mbzuai-oryx / LlamaV-o1
[ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs
☆307Updated 5 months ago
apple / ml-slowfast-llava
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
☆277Updated last year
SYuan03 / MM-IFEngine
[ICCV 2025] MM-IFEngine: Towards Multimodal Instruction Following
☆109Updated last month
EvolvingLMMs-Lab / LongVA
Long Context Transfer from Language to Vision
☆394Updated 7 months ago
yeliudev / VideoMind
💡 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
☆270Updated 3 weeks ago
microsoft / LLM2CLIP
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
☆557Updated 4 months ago
PhoenixZ810 / OmniAlign-V
Official Repository of ACL 2025 paper OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
☆142Updated 8 months ago
dongyh20 / Insight-V
[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
☆228Updated 3 months ago
turningpoint-ai / VisualThinker-R1-Zero
Explore the Multimodal “Aha Moment” on 2B Model
☆613Updated 7 months ago
tulerfeng / Video-R1
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆723Updated last month
SkyworkAI / Vitron
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
☆575Updated last year
zjysteven / lmms-finetune
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision,…
☆350Updated this week
RLHF-V / RLAIF-V
[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
☆421Updated 5 months ago
MME-Benchmarks / Video-MME
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
☆674Updated 2 months ago
EvolvingLMMs-Lab / LLaVA-OneVision-1.5
Fully Open Framework for Democratized Multimodal Training
☆585Updated last week
Fancy-MLLM / R1-Onevision
R1-onevision, a visual language model capable of deep CoT reasoning.
☆569Updated 6 months ago
dvlab-research / VisionZip
Official repository for VisionZip (CVPR 2025)
☆366Updated 3 months ago
IVGSZ / Flash-VStream
This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"
☆241Updated 2 weeks ago
luogen1996 / LLaVA-HR
[ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant
☆244Updated last year
VARGPT-family / VARGPT
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
☆345Updated 6 months ago
bytedance / tarsier
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…
☆497Updated 2 months ago
h-zhao1997 / cobra
[AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference
☆289Updated 9 months ago
thunlp / LLaVA-UHD
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
☆389Updated this week
FreedomIntelligence / LongLLaVA
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
☆211Updated 9 months ago
XiaomiMiMo / MiMo-VL
MiMo-VL
☆571Updated 2 months ago
boheumd / MA-LMM
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
☆339Updated last year
SkyworkAI / MoE-plus-plus
[ICLR 2025] MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
☆248Updated last year