Awesome Reasoning in MLLMs: Papers and Projects about learning to reason with MLLMs, including Chain-of-Thought (CoT), OpenAl o1, and DeepSeek-R1
☆62Mar 18, 2025Updated last year
Alternatives and similar repositories for Awesome-Reasoning-MLLM
Users that are interested in Awesome-Reasoning-MLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Collections of Papers and Projects for Multimodal Reasoning.☆108Apr 25, 2025Updated 11 months ago
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward☆93Aug 8, 2025Updated 7 months ago
- Latest Advances on Reasoning of Multimodal Large Language Models (Multimodal R1 \ Visual R1) ) 🍓☆36Apr 3, 2025Updated 11 months ago
- Code accompanying our EMNLP 2019 paper: "Revisiting the Evaluation of Theory of Mind through Question Answering"☆27Aug 9, 2020Updated 5 years ago
- A collection of multimodal reasoning papers, codes, datasets, benchmarks and resources.☆583Mar 8, 2026Updated 3 weeks ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- A Holistic Embodied Cognition Benchmark☆19Apr 3, 2025Updated 11 months ago
- Latest Advances on Long Chain-of-Thought Reasoning☆620Jul 18, 2025Updated 8 months ago
- ☆66Feb 4, 2026Updated last month
- DBPM is a simple algorithm designed as a lightweight plug-in without learnable parameters to enhance the performance of time series contr…☆17Mar 8, 2024Updated 2 years ago
- ☆41Dec 16, 2025Updated 3 months ago
- [NeurIPS 2025] Reasoning MLLM, Share-GRPO, advantage vanishing, sparse reward☆36Sep 19, 2025Updated 6 months ago
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey☆968Nov 14, 2025Updated 4 months ago
- The KlicStudio MCP server is a connector based on the Model Context Protocol (MCP), designed to facilitate interactions with KlicStudio s…☆20Jul 30, 2025Updated 8 months ago
- [ACL 2025 Main] SceneGenAgent: Precise Industrial Scene Generation with Coding Agent☆36Nov 29, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization☆439Dec 16, 2025Updated 3 months ago
- Official repository of "TDSD: Text-Driven Scene-Decoupled Weakly Supervised Video Anomaly Detection"☆11May 25, 2025Updated 10 months ago
- ☆111Sep 11, 2025Updated 6 months ago
- [ECCV'22 Poster] Explicit Image Caption Editing☆22Nov 30, 2022Updated 3 years ago
- Fetch arxiv data to LLM-friendly text☆131Feb 18, 2026Updated last month
- Developer project for getting basic API integrations working in under 5 minutes☆11Jan 30, 2026Updated 2 months ago
- The official repo for [ACM CSUR'24] "Empowering Agrifood System with Artificial Intelligence: A Survey of the Progress, Challenges and Op…☆12Dec 6, 2024Updated last year
- 本项目提供了基于910B的huggingface LLM模型的Tensor Parallel(TP)部署教程,同时也可以作为一份极简的TP学习代码。☆32Jan 6, 2026Updated 2 months ago
- 该系列的目的是让读者可以在基础的pytorch上,不依赖任何其他现成的外部库,从零开始理解并实现一个大语言模型的所有组成部分,以及训练微调代码,因此读者仅需python,pytorch和最基础深度学习背景知识即可。☆385Aug 28, 2025Updated 7 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Due to the huge vocaburary size (151,936) of Qwen models, the Embedding and LM Head weights are excessively heavy. Therefore, this projec…☆35Jan 6, 2026Updated 2 months ago
- ☆14Jun 29, 2024Updated last year
- utilities to deal with videos ...☆15Jul 27, 2020Updated 5 years ago
- Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement☆17Nov 11, 2024Updated last year
- This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs☆38Mar 9, 2025Updated last year
- Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.☆33Jul 21, 2023Updated 2 years ago
- Awesome-RAG-Vision: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision☆325Jan 25, 2026Updated 2 months ago
- A list of useful Open Source tools and scrapers to gather data for LLMs☆247Feb 24, 2025Updated last year
- Official Implementation of MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models☆13Nov 1, 2025Updated 4 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Block-Recurrent Dynamics in ViTs 🦖☆34Dec 24, 2025Updated 3 months ago
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆382Feb 23, 2025Updated last year
- Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models☆49Oct 30, 2025Updated 5 months ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆39Mar 4, 2024Updated 2 years ago
- R1-like Video-LLM for Temporal Grounding☆135Jun 20, 2025Updated 9 months ago
- ☆13May 9, 2023Updated 2 years ago
- ☆35Jun 9, 2025Updated 9 months ago