HJYao00/Awesome-Reasoning-MLLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/HJYao00/Awesome-Reasoning-MLLM)

HJYao00 / Awesome-Reasoning-MLLM

Awesome Reasoning in MLLMs: Papers and Projects about learning to reason with MLLMs, including Chain-of-Thought (CoT), OpenAl o1, and DeepSeek-R1

☆63

Alternatives and similar repositories for Awesome-Reasoning-MLLM

Users that are interested in Awesome-Reasoning-MLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yytzsy / SMCG
View on GitHub
Code for the paper "Controllable Video Captioning with an Exemplar Sentence"
☆12Apr 14, 2021Updated 5 years ago
phellonchen / Awesome-MLLM-Reasoning
View on GitHub
Latest Advances on Reasoning of Multimodal Large Language Models (Multimodal R1 \ Visual R1) ) 🍓
☆36Apr 3, 2025Updated last year
HJYao00 / MMReason
View on GitHub
[ICCV 2025] MMReason, MLLMs, step by step, reasoning benchmark, AGI
☆15Apr 25, 2026Updated 2 months ago
kxfan2002 / SophiaVL-R1
View on GitHub
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
☆94Aug 8, 2025Updated 11 months ago
facebookresearch / ToMi
View on GitHub
Code accompanying our EMNLP 2019 paper: "Revisiting the Evaluation of Theory of Mind through Question Answering"
☆29Aug 9, 2020Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
horseee / CoT-Valve
View on GitHub
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆91Feb 14, 2025Updated last year
lwpyh / Awesome-MLLM-Reasoning-Collection
View on GitHub
A collection of multimodal reasoning papers, codes, datasets, benchmarks and resources.
☆36Jul 1, 2026Updated 2 weeks ago
showlab / GEB-Plus
View on GitHub
[ECCV 2022] GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
☆17Aug 24, 2022Updated 3 years ago
yousefkotp / Flare-Free-Vision-Empowering-Uformer-with-Depth-Insights
View on GitHub
The official implementation for IEEE-ICASSP 2024 paper "Flare-Free Vision: Empowering Uformer with Depth Insights"
☆18Aug 27, 2024Updated last year
jingyi0000 / R1-VL
View on GitHub
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
☆352Jun 2, 2026Updated last month
OpenDCAI / Awesome_MLLMs_Reasoning
View on GitHub
☆112Sep 11, 2025Updated 10 months ago
samson-wang / dwconv
View on GitHub
☆10Apr 13, 2020Updated 6 years ago
shengyangsun / TDSD
View on GitHub
Official repository of "TDSD: Text-Driven Scene-Decoupled Weakly Supervised Video Anomaly Detection"
☆11May 25, 2025Updated last year
TerminologyHub / termhub-in-5-minutes
View on GitHub
Developer project for getting basic API integrations working in under 5 minutes
☆11May 22, 2026Updated last month
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
PyThaiNLP / thai-g2p-wiktionary-corpus
View on GitHub
Thai Grapheme to Phoneme (G2P) Wiktionary Corpus
☆13Jul 25, 2022Updated 3 years ago
adampower48 / AI-City-Anomaly-Detection
View on GitHub
My implementation of the vehicle anomaly detection from https://github.com/ShuaiBai623/AI-City-Anomaly-Detection
☆10Aug 30, 2019Updated 6 years ago
StanLei52 / TQVSR
View on GitHub
[Findings of EMNLP 2022] AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant
☆24Sep 11, 2023Updated 2 years ago
baaaad / ECE
View on GitHub
[ECCV'22 Poster] Explicit Image Caption Editing
☆22Nov 30, 2022Updated 3 years ago
yaotingwangofficial / Awesome-MCoT
View on GitHub
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
☆1,016May 22, 2026Updated 2 months ago
KaihuaTang / LLM-TP-Inference-on-910B
View on GitHub
本项目提供了基于910B的huggingface LLM模型的Tensor Parallel(TP)部署教程，同时也可以作为一份极简的TP学习代码。
☆32Jan 6, 2026Updated 6 months ago
oshears / adv-ml-2020-snn-project
View on GitHub
Advanced Machine Learning Fall 2020 Project Repository
☆12Dec 12, 2020Updated 5 years ago
Jiang-maomao / flare-removal
View on GitHub
☆13Oct 16, 2025Updated 9 months ago
kyegomez / MultiModal-ToT
View on GitHub
Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement
☆17Nov 11, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
apple / ml-mia-bench
View on GitHub
This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
☆38Mar 9, 2025Updated last year
zwq2018 / Auto_star
View on GitHub
auto star for repo lists
☆10Aug 26, 2023Updated 2 years ago
RizhaoCai / FAS_DataManager
View on GitHub
☆20Aug 27, 2022Updated 3 years ago
jungao1106 / ICoT
View on GitHub
[CVPR' 25] Interleaved-Modal Chain-of-Thought
☆112Dec 30, 2025Updated 6 months ago
www-Ye / Time-R1
View on GitHub
R1-like Video-LLM for Temporal Grounding
☆138Jun 20, 2025Updated last year
LightChen233 / Awesome-Long-Chain-of-Thought-Reasoning
View on GitHub
Latest Advances on Long Chain-of-Thought Reasoning
☆645Jul 18, 2025Updated last year
Zhengyu-Li / Deep-Network-Compression-based-on-Student-Teacher-Network-
View on GitHub
Deep Neural Network Compression based on Student-Teacher Network
☆14Jul 6, 2023Updated 3 years ago
XiPotatonium / chatbot-webui
View on GitHub
一个支持跨模态大语言模型的webui. A chatbot webui that supports various multi-modal large language models
☆11May 8, 2023Updated 3 years ago
TencentARC / pi-Tuning
View on GitHub
Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.
☆33Jul 21, 2023Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
cg1177 / Recursive-Multimodal-Agent
View on GitHub
☆19Jul 1, 2026Updated 3 weeks ago
dali-does / clevr-math
View on GitHub
☆13May 9, 2023Updated 3 years ago
meetdavidwan / crg
View on GitHub
PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"
☆39Mar 4, 2024Updated 2 years ago
Kejie-Wang / SolarPrediction
View on GitHub
A multi-modality model for solar irradiance forecasting
☆15Mar 14, 2017Updated 9 years ago
om-ai-lab / VL-CheckList
View on GitHub
Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]
☆138Apr 10, 2026Updated 3 months ago
BierOne / relation-vqa
View on GitHub
Re-implementation for 'R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering'.
☆12Mar 13, 2026Updated 4 months ago
bofang98 / UATVR
View on GitHub
[ICCV'23] UATVR: Uncertainty-Adaptive Text-Video Retrieval
☆13Nov 5, 2023Updated 2 years ago