yfzhang114/Awesome-Multimodal-Large-Language-Models

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yfzhang114/Awesome-Multimodal-Large-Language-Models)

yfzhang114 / Awesome-Multimodal-Large-Language-Models

Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models

☆1,186

Alternatives and similar repositories for Awesome-Multimodal-Large-Language-Models

Users that are interested in Awesome-Multimodal-Large-Language-Models are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs
View on GitHub
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…
☆1,437May 11, 2026Updated 2 months ago
zhaochen0110 / Awesome_Think_With_Images
View on GitHub
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…
☆1,497Mar 9, 2026Updated 4 months ago
BradyFU / Awesome-Multimodal-Large-Language-Models
View on GitHub
Latest Advances on Multimodal Large Language Models
☆17,958Updated this week
hiyouga / EasyR1
View on GitHub
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
☆5,083Updated this week
Visual-Agent / DeepEyes
View on GitHub
☆1,251Nov 20, 2025Updated 8 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
EvolvingLMMs-Lab / open-r1-multimodal
View on GitHub
A fork to add multimodal model training to open-r1
☆1,594Feb 8, 2025Updated last year
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,699Updated this week
open-compass / VLMEvalKit
View on GitHub
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
☆4,307Jul 22, 2026Updated last week
modelscope / ms-swift
View on GitHub
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL…
☆14,974Updated this week
OpenRLHF / OpenRLHF
View on GitHub
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…
☆9,855Jul 14, 2026Updated 2 weeks ago
TideDra / lmm-r1
View on GitHub
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
☆848May 14, 2025Updated last year
daixiangzi / Awesome-Token-Compress
View on GitHub
A paper list of some recent works about Token Compress for Vit and VLM
☆944Updated this week
Kwai-YuanQi / MM-RLHF
View on GitHub
The Next Step Forward in Multimodal LLM Alignment
☆199May 1, 2025Updated last year
yfzhang114 / Thyme
View on GitHub
✨✨ [ICLR 2026] Think Beyond Images
☆584Sep 23, 2025Updated 10 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
THUDM / slime
View on GitHub
slime is an LLM post-training framework for RL Scaling.
☆7,679Updated this week
om-ai-lab / VLM-R1
View on GitHub
Solve Visual Understanding with Reinforced VLMs
☆6,018Jul 7, 2026Updated 3 weeks ago
ATH-MaaS / Awesome-Unified-Multimodal-Models
View on GitHub
Awesome Unified Multimodal Models
☆1,306Mar 24, 2026Updated 4 months ago
ligeng0197 / Awesome-Thinking-With-Images
View on GitHub
Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grain…
☆113Aug 21, 2025Updated 11 months ago
Mini-o3 / Mini-o3
View on GitHub
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
☆423Jan 29, 2026Updated 6 months ago
FanqingM / MM-Eureka-V0
View on GitHub
MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka
☆325Jun 21, 2025Updated last year
ModalMinds / MM-EUREKA
View on GitHub
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
☆771Sep 7, 2025Updated 10 months ago
TsinghuaC3I / Awesome-RL-for-LRMs
View on GitHub
A Survey of Reinforcement Learning for Large Reasoning Models
☆2,470Nov 9, 2025Updated 8 months ago
EvolvingLMMs-Lab / LLaVA-OneVision-2
View on GitHub
Fully Open Framework for Democratized Multimodal Training
☆1,154Updated this week
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
StarsfieldAI / R1-V
View on GitHub
Witness the aha moment of VLM with less than $3.
☆4,064May 19, 2025Updated last year
yaotingwangofficial / Awesome-MCoT
View on GitHub
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
☆1,017May 22, 2026Updated 2 months ago
yfzhang114 / SliME
View on GitHub
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
☆163Dec 26, 2024Updated last year
alibaba / ROLL
View on GitHub
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
☆3,333Updated this week
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,713Jun 15, 2026Updated last month
EvolvingLMMs-Lab / lmms-eval
View on GitHub
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
☆4,337Updated this week
Liuziyu77 / Visual-RFT
View on GitHub
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
☆2,262Oct 29, 2025Updated 9 months ago
Osilly / Awesome-Interleaving-Reasoning
View on GitHub
Interleaving Reasoning: Next-Generation Reasoning Systems for AGI
☆281Jun 5, 2026Updated last month
QwenLM / Qwen3-VL
View on GitHub
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆19,682Jan 30, 2026Updated 5 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
OpenGVLab / InternVL
View on GitHub
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
☆10,106Sep 22, 2025Updated 10 months ago
showlab / Awesome-Unified-Multimodal-Models
View on GitHub
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
☆830Oct 10, 2025Updated 9 months ago
langfengQ / verl-agent
View on GitHub
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in…
☆2,158Jun 9, 2026Updated last month
Kwai-Keye / Keye
View on GitHub
☆808Jun 10, 2026Updated last month
Fancy-MLLM / R1-Onevision
View on GitHub
R1-onevision, a visual language model capable of deep CoT reasoning.
☆581Apr 13, 2025Updated last year
ByteDance-Seed / VeOmni
View on GitHub
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
☆2,111Updated this week
HHHHHejia / Awesome-AgenticLLM-RL-Papers
View on GitHub
☆1,849Jun 18, 2026Updated last month