Awesome Reasoning in MLLMs: Papers and Projects about learning to reason with MLLMs, including Chain-of-Thought (CoT), OpenAl o1, and DeepSeek-R1
☆63Mar 18, 2025Updated last year
Alternatives and similar repositories for Awesome-Reasoning-MLLM
Users that are interested in Awesome-Reasoning-MLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Collections of Papers and Projects for Multimodal Reasoning.☆109Apr 25, 2025Updated last year
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward☆95Aug 8, 2025Updated 10 months ago
- Code for the paper "Controllable Video Captioning with an Exemplar Sentence"☆12Apr 14, 2021Updated 5 years ago
- Latest Advances on Reasoning of Multimodal Large Language Models (Multimodal R1 \ Visual R1) ) 🍓☆36Apr 3, 2025Updated last year
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆92Feb 14, 2025Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- [LLaVA-Video-R1]✨First Adaptation of R1 to LLaVA-Video (2025-03-18)☆68May 9, 2025Updated last year
- A Holistic Embodied Cognition Benchmark☆19Apr 3, 2025Updated last year
- Latest Advances on Long Chain-of-Thought Reasoning☆637Jul 18, 2025Updated 10 months ago
- ☆70Feb 4, 2026Updated 4 months ago
- Arabic Grapheme-to-Phoneme (G2P) Conversion☆15Mar 15, 2025Updated last year
- [ECCV 2022] GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval☆17Aug 24, 2022Updated 3 years ago
- Thai Grapheme to Phoneme (G2P) Wiktionary Corpus☆13Jul 25, 2022Updated 3 years ago
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey☆1,005May 22, 2026Updated 3 weeks ago
- [ACL 2025 Main] SceneGenAgent: Precise Industrial Scene Generation with Coding Agent☆37Nov 29, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization☆351Jun 2, 2026Updated last week
- Official repository of "TDSD: Text-Driven Scene-Decoupled Weakly Supervised Video Anomaly Detection"☆11May 25, 2025Updated last year
- ☆110Sep 11, 2025Updated 9 months ago
- [ECCV'22 Poster] Explicit Image Caption Editing☆22Nov 30, 2022Updated 3 years ago
- Developer project for getting basic API integrations working in under 5 minutes☆11May 22, 2026Updated 3 weeks ago
- The official repo for [ACM CSUR'24] "Empowering Agrifood System with Artificial Intelligence: A Survey of the Progress, Challenges and Op…☆12Dec 6, 2024Updated last year
- [Findings of EMNLP 2022] AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant☆23Sep 11, 2023Updated 2 years ago
- 本项目提供了基于910B的huggingface LLM模型的Tensor Parallel(TP)部署教程,同时也可以作为一份极简的TP学习代码。☆32Jan 6, 2026Updated 5 months ago
- My implementation of the vehicle anomaly detection from https://github.com/ShuaiBai623/AI-City-Anomaly-Detection☆10Aug 30, 2019Updated 6 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 该系列的目的是让读者可以在基础的pytorch上,不依赖任何其他现成的外部库,从零开始理解并实现一个大语言模型的所有组成部分,以及训练微调代码,因此读者仅需python,pytorch和最基础深度学习背景知识即可。☆386Aug 28, 2025Updated 9 months ago
- ICS_2020_PJ☆11Dec 25, 2020Updated 5 years ago
- utilities to deal with videos ...☆15Jul 27, 2020Updated 5 years ago
- Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement☆17Nov 11, 2024Updated last year
- ☆25Mar 17, 2024Updated 2 years ago
- Advanced Machine Learning Fall 2020 Project Repository☆12Dec 12, 2020Updated 5 years ago
- Seamless 3D Object Integration using Gaussian Splatting☆20Jul 1, 2024Updated last year
- 一个支持跨模态大语言模型的webui. A chatbot webui that supports various multi-modal large language models☆11May 8, 2023Updated 3 years ago
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆383Feb 23, 2025Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- ☆22Sep 14, 2014Updated 11 years ago
- Awesome-RAG-Vision: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision☆334Jan 25, 2026Updated 4 months ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆39Mar 4, 2024Updated 2 years ago
- R1-like Video-LLM for Temporal Grounding☆137Jun 20, 2025Updated 11 months ago
- LLM evaluation.☆16Nov 7, 2023Updated 2 years ago
- ☆13May 9, 2023Updated 3 years ago
- ☆28Aug 19, 2025Updated 9 months ago