gokayfem / awesome-vlm-architecturesLinks

Famous Vision Language Models and Their Architectures

☆1,047

Alternatives and similar repositories for awesome-vlm-architectures

Users that are interested in awesome-vlm-architectures are comparing it to the libraries listed below

Sorting:

open-compass / VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
☆3,255Updated this week
2U1 / Qwen-VL-Series-Finetune
An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.
☆1,305Updated last week
TinyLLaVA / TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
☆911Updated 6 months ago
mbzuai-oryx / groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆922Updated 2 months ago
allenai / molmo
Code for the Molmo Vision-Language Model
☆786Updated 10 months ago
TRI-ML / prismatic-vlms
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
☆830Updated last year
facebookresearch / perception_models
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
☆1,698Updated last month
PKU-YuanGroup / LLaVA-CoT
[ICCV 2025] LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
☆2,088Updated 2 weeks ago
OpenGVLab / VisionLLM
VisionLLM Series
☆1,119Updated 8 months ago
BAAI-DCAI / Bunny
A family of lightweight multimodal models.
☆1,045Updated 11 months ago
microsoft / LLM2CLIP
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
☆557Updated 3 months ago
DirtyHarryLYL / LLM-in-Vision
Recent LLM-based CV and related works. Welcome to comment/contribute!
☆873Updated 7 months ago
EvolvingLMMs-Lab / open-r1-multimodal
A fork to add multimodal model training to open-r1
☆1,412Updated 8 months ago
tulerfeng / Video-R1
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆723Updated last month
JindongGu / Awesome-Prompting-on-Vision-Language-Model
This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation …
☆494Updated 7 months ago
beichenzbc / Long-CLIP
[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
☆860Updated last year
zli12321 / Vision-Language-Models-Overview
A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.
☆410Updated last week
Computer-Vision-in-the-Wild / CVinW_Readings
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
☆1,339Updated last year
yaotingwangofficial / Awesome-MCoT
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
☆871Updated 2 months ago
MME-Benchmarks / Video-MME
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
☆670Updated 2 months ago
DAMO-NLP-SG / VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
☆1,235Updated 9 months ago
DAMO-NLP-SG / VideoLLaMA3
Frontier Multimodal Foundation Models for Image and Video Understanding
☆1,019Updated 2 months ago
zjysteven / lmms-finetune
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision,…
☆345Updated 8 months ago
yunlong10 / Awesome-LLMs-for-Video-Understanding
🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.
☆2,860Updated 3 weeks ago
friedrichor / Awesome-Multimodal-Papers
A curated list of awesome Multimodal studies.
☆283Updated 3 months ago
JackYFL / awesome-VLLMs
This repository collects papers on VLLM applications. We will update new papers irregularly.
☆171Updated last month
LLaVA-VL / LLaVA-NeXT
☆4,344Updated last month
showlab / Awesome-Unified-Multimodal-Models
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
☆729Updated 3 weeks ago
showlab / Awesome-MLLM-Hallucination
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
☆880Updated last month
apple / ml-aim
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
☆1,378Updated 2 months ago