yaotingwangofficial / Awesome-MCoT
View external linksLinks

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

☆957

Alternatives and similar repositories for Awesome-MCoT

Users that are interested in Awesome-MCoT are comparing it to the libraries listed below

Sorting:

Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs
View on GitHub
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…
☆1,350Dec 7, 2025Updated 2 months ago
Fancy-MLLM / R1-Onevision
View on GitHub
R1-onevision, a visual language model capable of deep CoT reasoning.
☆575Apr 13, 2025Updated 10 months ago
om-ai-lab / VLM-R1
View on GitHub
Solve Visual Understanding with Reinforced VLMs
☆5,841Oct 21, 2025Updated 3 months ago
tulerfeng / Video-R1
View on GitHub
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆820Dec 14, 2025Updated 2 months ago
zhaochen0110 / Awesome_Think_With_Images
View on GitHub
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…
☆1,329Feb 3, 2026Updated 2 weeks ago
TideDra / lmm-r1
View on GitHub
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
☆840May 14, 2025Updated 9 months ago
EvolvingLMMs-Lab / open-r1-multimodal
View on GitHub
A fork to add multimodal model training to open-r1
☆1,474Feb 8, 2025Updated last year
ModalMinds / MM-EUREKA
View on GitHub
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
☆768Sep 7, 2025Updated 5 months ago
Osilly / Vision-R1
View on GitHub
[ICLR2026] This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that…
☆765Jan 26, 2026Updated 3 weeks ago
showlab / Awesome-Unified-Multimodal-Models
View on GitHub
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
☆799Oct 10, 2025Updated 4 months ago
StarsfieldAI / R1-V
View on GitHub
Witness the aha moment of VLM with less than $3.
☆4,032May 19, 2025Updated 8 months ago
Liuziyu77 / Visual-RFT
View on GitHub
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
☆2,319Oct 29, 2025Updated 3 months ago
hiyouga / EasyR1
View on GitHub
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
☆4,599Feb 10, 2026Updated last week
BradyFU / Awesome-Multimodal-Large-Language-Models
View on GitHub
Latest Advances on Multimodal Large Language Models
☆17,337Feb 7, 2026Updated last week
Wang-Xiaodong1899 / Open-R1-Video
View on GitHub
✨First Open-Source R1-like Video-LLM [2025/02/18]
☆381Feb 23, 2025Updated 11 months ago
deepcs233 / Visual-CoT
View on GitHub
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …
☆424Dec 22, 2024Updated last year
turningpoint-ai / VisualThinker-R1-Zero
View on GitHub
Explore the Multimodal “Aha Moment” on 2B Model
☆623Mar 18, 2025Updated 10 months ago
open-compass / VLMEvalKit
View on GitHub
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
☆3,816Updated this week
JIA-Lab-research / Seg-Zero
View on GitHub
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
☆603Jan 17, 2026Updated last month
LightChen233 / Awesome-Long-Chain-of-Thought-Reasoning
View on GitHub
Latest Advances on Long Chain-of-Thought Reasoning
☆609Jul 18, 2025Updated 6 months ago
gogoczh / CoMT
View on GitHub
code for "CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models"
☆19Mar 10, 2025Updated 11 months ago
ChaofanTao / Autoregressive-Models-in-Vision-Survey
View on GitHub
[TMLR 2025🔥] A survey for the autoregressive models in vision.
☆786Nov 8, 2025Updated 3 months ago
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,562Sep 14, 2025Updated 5 months ago
showlab / Show-o
View on GitHub
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
☆1,876Jan 8, 2026Updated last month
OpenGVLab / MMIU
View on GitHub
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆94Sep 14, 2024Updated last year
baaivision / Emu3
View on GitHub
Next-Token Prediction is All You Need
☆2,345Jan 12, 2026Updated last month
MikeWangWZHL / PAPO
View on GitHub
Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"
☆116Feb 4, 2026Updated last week
EvolvingLMMs-Lab / lmms-eval
View on GitHub
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
☆3,684Updated this week
yunlong10 / Awesome-LLMs-for-Video-Understanding
View on GitHub
🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.
☆3,076Dec 20, 2025Updated last month
vision-x-nyu / thinking-in-space
View on GitHub
Official repo and evaluation implementation of VSI-Bench
☆670Aug 5, 2025Updated 6 months ago
PKU-YuanGroup / LLaVA-CoT
View on GitHub
[ICCV 2025] LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
☆2,125Dec 12, 2025Updated 2 months ago
HITsz-TMG / Awesome-Large-Multimodal-Reasoning-Models
View on GitHub
The development and future prospects of large multimodal reasoning models.
☆585Jan 9, 2026Updated last month
baaivision / EVE
View on GitHub
EVE Series: Encoder-Free Vision-Language Models from BAAI
☆368Jul 24, 2025Updated 6 months ago
NVlabs / Long-RL
View on GitHub
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆691Sep 24, 2025Updated 4 months ago
showlab / Awesome-MLLM-Hallucination
View on GitHub
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
☆979Sep 27, 2025Updated 4 months ago
FanqingM / MM-Eureka-V0
View on GitHub
MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka
☆324Jun 21, 2025Updated 7 months ago
JiuhaiChen / BLIP3o
View on GitHub
Official implementation of BLIP3o-Series
☆1,638Nov 29, 2025Updated 2 months ago
PolyU-ChenLab / ETBench
View on GitHub
👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)
☆74Jan 20, 2025Updated last year
TIGER-AI-Lab / Pixel-Reasoner
View on GitHub
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
☆276Nov 6, 2025Updated 3 months ago

yaotingwangofficial / Awesome-MCoTView external linksLinks

Alternatives and similar repositories for Awesome-MCoT

yaotingwangofficial / Awesome-MCoT
View external linksLinks