HJYao00 / Awesome-Reasoning-MLLMView external linksLinks
Awesome Reasoning in MLLMs: Papers and Projects about learning to reason with MLLMs, including Chain-of-Thought (CoT), OpenAl o1, and DeepSeek-R1
☆62Mar 18, 2025Updated 10 months ago
Alternatives and similar repositories for Awesome-Reasoning-MLLM
Users that are interested in Awesome-Reasoning-MLLM are comparing it to the libraries listed below
Sorting:
- Collections of Papers and Projects for Multimodal Reasoning.☆107Apr 25, 2025Updated 9 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆91Feb 14, 2025Updated last year
- Code for the paper "Controllable Video Captioning with an Exemplar Sentence"☆12Apr 14, 2021Updated 4 years ago
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward☆91Aug 8, 2025Updated 6 months ago
- Latest Advances on Reasoning of Multimodal Large Language Models (Multimodal R1 \ Visual R1) ) 🍓☆36Apr 3, 2025Updated 10 months ago
- A Holistic Embodied Cognition Benchmark☆18Apr 3, 2025Updated 10 months ago
- ☆40Dec 16, 2025Updated 2 months ago
- Latest Advances on Long Chain-of-Thought Reasoning☆609Jul 18, 2025Updated 6 months ago
- [LLaVA-Video-R1]✨First Adaptation of R1 to LLaVA-Video (2025-03-18)☆68May 9, 2025Updated 9 months ago
- ☆63Feb 4, 2026Updated last week
- [ECCV'22 Poster] Explicit Image Caption Editing☆22Nov 30, 2022Updated 3 years ago
- You ship an iOS app, we ship an Apple developer license.☆26Oct 3, 2025Updated 4 months ago
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey☆957Nov 14, 2025Updated 3 months ago
- ☆113Sep 11, 2025Updated 5 months ago
- A simple lightweight Model Context Protocol (MCP) server integration framework☆17Jan 23, 2026Updated 3 weeks ago
- 本项目提供了基于910B的huggingface LLM模型的Tensor Parallel(TP)部署教程,同时也可以作为一份极简的TP学习代码。☆30Jan 6, 2026Updated last month
- This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs☆36Mar 9, 2025Updated 11 months ago
- [ICCV 2025] Dynamic-VLM☆28Dec 16, 2024Updated last year
- Due to the huge vocaburary size (151,936) of Qwen models, the Embedding and LM Head weights are excessively heavy. Therefore, this projec…☆33Jan 6, 2026Updated last month
- Structured TRIZ prompt engineering for LLMs in an open, portable XML format – MIT licensed.☆14Nov 11, 2025Updated 3 months ago
- AuraMatrix is personality analysis web which using llm to do evaluation. I have made this for Gyanotsav-2025 to show different ways to ut…☆11Dec 22, 2025Updated last month
- Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.☆33Jul 21, 2023Updated 2 years ago
- [ACL 2025] The official pytorch implement of "MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection".☆26May 26, 2025Updated 8 months ago
- [CVPR' 25] Interleaved-Modal Chain-of-Thought☆106Dec 30, 2025Updated last month
- Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]☆820Dec 14, 2025Updated 2 months ago
- CoachLint is your AI coding coach. It guides you through errors instead of just solving them for you.☆23Nov 20, 2025Updated 2 months ago
- Set screen resolution on all iOS versions.☆12Sep 2, 2025Updated 5 months ago
- VibEx (vx) is a developer-friendly CLI tool that streamlines the process of working with AI coding assistants. It helps developers prepar…☆28May 17, 2025Updated 9 months ago
- MAGELLAN: Metacognitive predictions of learning progress guide autotelic LLM agents in large goal spaces☆10Mar 24, 2025Updated 10 months ago
- Glitch Gremlin AI☆15Apr 5, 2025Updated 10 months ago
- Jailbroken iOS repository for package managers☆59Jul 25, 2025Updated 6 months ago
- My old 2017-2018 menu template, for iOS. Hopefully some of you find it useful.☆10Feb 15, 2022Updated 4 years ago
- Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models☆47Oct 30, 2025Updated 3 months ago
- Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]☆136Sep 29, 2024Updated last year
- [NeurIPS 2023] Generalized Logit Adjustment☆39Apr 21, 2024Updated last year
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆39Mar 4, 2024Updated last year
- Collection of papers and repos for multimodal chain-of-thought☆89Nov 6, 2024Updated last year
- [DMLR 2024] Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift☆38Jan 25, 2024Updated 2 years ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆86Mar 21, 2024Updated last year