THUDM / CogVLM2Links

GPT4V-level open-source multi-modal model based on Llama3-8B

☆2,383

Alternatives and similar repositories for CogVLM2

Users that are interested in CogVLM2 are comparing it to the libraries listed below

Sorting:

InternLM / InternLM-XComposer
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
☆2,871Updated last month
LLaVA-VL / LLaVA-NeXT
☆4,048Updated last month
QwenLM / Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
☆6,092Updated 11 months ago
PKU-YuanGroup / MoE-LLaVA
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
☆2,196Updated last week
OpenGVLab / InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
☆8,611Updated last week
InternLM / xtuner
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
☆4,654Updated last week
THUDM / CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
☆6,618Updated last year
baaivision / Emu3
Next-Token Prediction is All You Need
☆2,171Updated 4 months ago
AIDC-AI / Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
☆992Updated last month
VITA-MLLM / VITA
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
☆2,360Updated 3 months ago
X-PLUG / mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
☆2,223Updated last month
cambrian-mllm / cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆1,929Updated 8 months ago
dvlab-research / MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
☆3,300Updated last year
baaivision / Emu
Emu Series: Generative Multimodal Models from BAAI
☆1,736Updated 9 months ago
open-compass / VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
☆2,762Updated this week
BAAI-DCAI / Bunny
A family of lightweight multimodal models.
☆1,023Updated 8 months ago
NVlabs / VILA
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…
☆3,424Updated this week
PKU-YuanGroup / LLaVA-CoT
[ICCV 2025] LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
☆2,030Updated this week
DAMO-NLP-SG / VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
☆1,195Updated 6 months ago
Vision-CAIR / MiniGPT4-video
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
☆626Updated 7 months ago
X-PLUG / mPLUG-Owl
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
☆2,500Updated 3 months ago
mbzuai-oryx / LLaVA-pp
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
☆840Updated last year
PKU-YuanGroup / Video-LLaVA
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
☆3,309Updated 7 months ago
Vchitect / Latte
[TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.
☆1,850Updated 3 months ago
Picsart-AI-Research / StreamingT2V
[CVPR 2025] StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
☆1,584Updated 3 months ago
Meituan-AutoML / MobileVLM
Strong and Open Vision Language Assistant for Mobile Devices
☆1,245Updated last year
Ucas-HaoranWei / Vary-toy
Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)
☆619Updated 6 months ago
dvlab-research / LLaMA-VID
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
☆822Updated 11 months ago
hiyouga / EasyR1
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
☆3,107Updated last week
Alpha-VLLM / Lumina-T2X
Lumina-T2X is a unified framework for Text to Any Modality Generation
☆2,208Updated 5 months ago