zli12321/Vision-Language-Models-Overview

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zli12321/Vision-Language-Models-Overview)

zli12321 / Vision-Language-Models-Overview

A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.

☆679

Alternatives and similar repositories for Vision-Language-Models-Overview

Users that are interested in Vision-Language-Models-Overview are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zli12321 / qa_metrics
View on GitHub
An easy python package to run quick basic QA evaluations. This package includes standardized QA evaluation metrics and semantic evaluatio…
☆61Jul 18, 2025Updated last year
zli12321 / VideoHallu
View on GitHub
Synthetic Video hallucination and Mitigation
☆23Sep 21, 2025Updated 10 months ago
zli12321 / Vision-SR1
View on GitHub
Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
☆175Mar 14, 2026Updated 4 months ago
zli12321 / MM-Zero
View on GitHub
Self-evolving vision language models from zero data
☆77Mar 14, 2026Updated 4 months ago
gokayfem / awesome-vlm-architectures
View on GitHub
Famous Vision Language Models and Their Architectures
☆1,284Jan 11, 2026Updated 6 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
jingyi0000 / VLM_survey
View on GitHub
Collection of AWESOME vision-language models for vision tasks
☆3,128Oct 14, 2025Updated 9 months ago
wuxiyang1996 / COS-PLAY
View on GitHub
COS-PLAY: Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Game Play
☆30Jul 11, 2026Updated 2 weeks ago
JackYFL / awesome-VLLMs
View on GitHub
This repository collects papers on VLLM applications. We will update new papers irregularly.
☆219Feb 23, 2026Updated 5 months ago
zli12321 / FFGO-Video-Customization
View on GitHub
Video Content Customization Using First Frame
☆194Mar 17, 2026Updated 4 months ago
open-compass / VLMEvalKit
View on GitHub
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
☆4,302Updated this week
MPSC-UMBC / Efficient-Vision-Language-Models-A-Survey
View on GitHub
[2025] Efficient Vision Language Models: A Survey
☆52Jul 14, 2025Updated last year
QwenLM / Qwen3-VL
View on GitHub
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆19,664Jan 30, 2026Updated 5 months ago
BradyFU / Awesome-Multimodal-Large-Language-Models
View on GitHub
Latest Advances on Multimodal Large Language Models
☆17,956Jul 2, 2026Updated 3 weeks ago
Hongyang-Du / VideoGPA
View on GitHub
[ICML'26] VideoGPA is a self-supervised framework that enhances 3D consistency in Video Diffusion Models.
☆70Jun 6, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
yaotingwangofficial / Awesome-MCoT
View on GitHub
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
☆1,017May 22, 2026Updated 2 months ago
swordlidev / Evaluation-Multimodal-LLMs-Survey
View on GitHub
A Survey on Benchmarks of Multimodal Large Language Models
☆156Jul 13, 2026Updated last week
umd-huang-lab / Mementos
View on GitHub
☆32Feb 8, 2024Updated 2 years ago
Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs
View on GitHub
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…
☆1,435May 11, 2026Updated 2 months ago
bruno686 / VisPlay
View on GitHub
[CVPR'26] VisPlay: Self-Evolving Vision-Language Models
☆64Feb 25, 2026Updated 5 months ago
JIA-Lab-research / VisionReasoner
View on GitHub
[ICLR 2026] VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
☆348Feb 9, 2026Updated 5 months ago
jonyzhang2023 / awesome-embodied-vla-va-vln
View on GitHub
A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (…
☆3,403Jul 7, 2026Updated 2 weeks ago
fscdc / Awesome-Efficient-Reasoning-Models
View on GitHub
[TMLR 2025] Efficient Reasoning Models: A Survey
☆315Jun 26, 2026Updated last month
om-ai-lab / VLM-R1
View on GitHub
Solve Visual Understanding with Reinforced VLMs
☆6,014Jul 7, 2026Updated 2 weeks ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
HITsz-TMG / Awesome-Large-Multimodal-Reasoning-Models
View on GitHub
The development and future prospects of large multimodal reasoning models.
☆614Jan 9, 2026Updated 6 months ago
OpenLAIR / nano-claude-code
View on GitHub
☆28Apr 11, 2026Updated 3 months ago
huggingface / nanoVLM
View on GitHub
The simplest, fastest repository for training/finetuning small-sized VLMs.
☆4,964Oct 27, 2025Updated 8 months ago
SIM-xidian / Hybrid-View-Self-supervised-Framework-for-Automatic-Modulation-Recognition
View on GitHub
☆12Oct 24, 2024Updated last year
modelscope / ms-swift
View on GitHub
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL…
☆14,946Updated this week
Hongyang-Du / awesome-3d-datasets
View on GitHub
[CVPRW'26] A collection and survey of 3d dataset
☆34Jun 4, 2026Updated last month
OpenGVLab / InternVL
View on GitHub
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
☆10,102Sep 22, 2025Updated 10 months ago
yuecao0119 / MMInstruct
View on GitHub
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆64Nov 7, 2024Updated last year
leofan90 / Awesome-World-Models
View on GitHub
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and A…
☆1,914Updated this week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
swordlidev / Efficient-Multimodal-LLMs-Survey
View on GitHub
Efficient Multimodal Large Language Models: A Survey
☆386Apr 29, 2025Updated last year
EvolvingLMMs-Lab / lmms-eval
View on GitHub
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
☆4,336Updated this week
LightChen233 / Awesome-Long-Chain-of-Thought-Reasoning
View on GitHub
Latest Advances on Long Chain-of-Thought Reasoning
☆647Jul 18, 2025Updated last year
facebookresearch / dinov3
View on GitHub
Reference PyTorch implementation and models for DINOv3
☆11,006Jul 15, 2026Updated last week
openai / CLIP
View on GitHub
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
☆34,069Mar 25, 2026Updated 4 months ago
linhuixiao / Awesome-Visual-Grounding
View on GitHub
[TPAMI 2025] Towards Visual Grounding: A Survey
☆322Nov 18, 2025Updated 8 months ago
Summu77 / V-Attack
View on GitHub
[CVPR2026] V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs
☆20Dec 8, 2025Updated 7 months ago