Wang-Xiaodong1899/Awesome-Multimodal-Large-Language-Models

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Wang-Xiaodong1899/Awesome-Multimodal-Large-Language-Models)

Wang-Xiaodong1899 / Awesome-Multimodal-Large-Language-Models

🔥Awesome Multimodal Large Language Models Paper List

☆154

Alternatives and similar repositories for Awesome-Multimodal-Large-Language-Models

Users that are interested in Awesome-Multimodal-Large-Language-Models are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Wang-Xiaodong1899 / Open-R1-Video
View on GitHub
✨First Open-Source R1-like Video-LLM [2025/02/18]
☆382Jul 1, 2026Updated 3 weeks ago
PKU-YuanGroup / LLMBind
View on GitHub
LLMBind: A Unified Modality-Task Integration Framework
☆19Jun 16, 2024Updated 2 years ago
Yaxin9Luo / Gamma-MOD
View on GitHub
[ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models
☆45Oct 28, 2025Updated 8 months ago
ywh187 / FitPrune
View on GitHub
☆68Jan 23, 2026Updated 6 months ago
KarnaYip / C2RoPE
View on GitHub
[ICRA 26] C^2ROPE: Causal Continuous Rotary Positional Encoding for 3D Large Multimodal-Models Reasoning
☆27Feb 13, 2026Updated 5 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
ModalMinds / MM-EUREKA
View on GitHub
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
☆770Sep 7, 2025Updated 10 months ago
FanqingM / MM-Eureka-V0
View on GitHub
MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka
☆325Jun 21, 2025Updated last year
Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs
View on GitHub
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…
☆1,435May 11, 2026Updated 2 months ago
zhaochen0110 / Awesome_Think_With_Images
View on GitHub
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…
☆1,493Mar 9, 2026Updated 4 months ago
EdenGabriel / TaskWeave
View on GitHub
[CVPR 2024 Accepted] TaskWeave: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection
☆30Sep 26, 2024Updated last year
YuqingWang1029 / TokenBridge
View on GitHub
[ICCV2025] TokenBridge: Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation. https://yuqingwang1029.github.io/To…
☆158Jul 24, 2025Updated last year
Yxxxb / VoCo-LLaMA
View on GitHub
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆205Jun 18, 2025Updated last year
turningpoint-ai / VisualThinker-R1-Zero
View on GitHub
Explore the Multimodal “Aha Moment” on 2B Model
☆624Mar 18, 2025Updated last year
OpenDCAI / Awesome_MLLMs_Reasoning
View on GitHub
☆112Sep 11, 2025Updated 10 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ErikZ719 / CoTA
View on GitHub
[ICLR 26] Context Tokens are Anchors: Understanding the Repeat Curse in dMLLMs from an Information Flow Perspective
☆16Mar 6, 2026Updated 4 months ago
UniX-AI-Lab / WorldReasonBench
View on GitHub
WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors
☆22May 19, 2026Updated 2 months ago
Chenyu-Wang567 / All-Angles-Bench
View on GitHub
Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs
☆70Mar 22, 2026Updated 4 months ago
Purshow / Awesome-Unified-Multimodal
View on GitHub
📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.
☆365Jan 8, 2026Updated 6 months ago
RUC-NLPIR / VideoDeepResearch
View on GitHub
☆155Nov 17, 2025Updated 8 months ago
LengSicong / MMR1
View on GitHub
[CVPR 2026] MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
☆217Sep 26, 2025Updated 9 months ago
ttengwang / Awesome_Long_Form_Video_Understanding
View on GitHub
Awesome papers & datasets specifically focused on long-term videos.
☆381Oct 9, 2025Updated 9 months ago
Wild-Cooperation-Hub / Awesome-MLLM-Reasoning-Benchmarks
View on GitHub
A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.
☆76Mar 18, 2025Updated last year
TIGER-AI-Lab / Pixel-Reasoner
View on GitHub
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
☆301Jun 4, 2026Updated last month
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
PKU-YuanGroup / UniSandBox
View on GitHub
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward
☆60Nov 27, 2025Updated 7 months ago
EvolvingLMMs-Lab / open-r1-multimodal
View on GitHub
A fork to add multimodal model training to open-r1
☆1,593Feb 8, 2025Updated last year
DoubtedSteam / DyVTE
View on GitHub
The official implement of "Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings"
☆18Dec 5, 2024Updated last year
latentcraft / replay
View on GitHub
[CVPR 2026] Boosting Reasoning in Large Multimodal Models via Activation Replay
☆24May 7, 2026Updated 2 months ago
Ziyang412 / VideoTree
View on GitHub
Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
☆165Jun 23, 2025Updated last year
ttgeng233 / LongVALE
View on GitHub
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos. (CVPR 2025))
☆61Jun 9, 2025Updated last year
mm-vl / ULM-R1
View on GitHub
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
☆48Jul 22, 2025Updated last year
tulerfeng / Video-R1
View on GitHub
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆882Dec 14, 2025Updated 7 months ago
thefcraft / prompt-generator-stable-diffusion
View on GitHub
Prompt Generator model for Stable Diffusion Models
☆12Jun 20, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
TobyYang7 / Llava_Qwen2
View on GitHub
Visual Instruction Tuning for Qwen2 Base Model
☆43Jun 29, 2024Updated 2 years ago
alenai97 / PEFT-MLLM
View on GitHub
Official Code and data for ACL 2024 finding, "An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models"
☆25Nov 10, 2024Updated last year
TencentARC / MindOmni
View on GitHub
[NeurIPS2025] The official implementation of MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
☆139Oct 15, 2025Updated 9 months ago
MCG-NJU / CaReBench
View on GitHub
A Fine-grained Benchmark for Video Captioning and Retrieval
☆30Jul 16, 2025Updated last year
NVlabs / Long-RL
View on GitHub
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆726Sep 24, 2025Updated 10 months ago
yellow-binary-tree / HawkEye
View on GitHub
Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos
☆47Apr 29, 2024Updated 2 years ago
yaotingwangofficial / Awesome-MCoT
View on GitHub
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
☆1,017May 22, 2026Updated 2 months ago