HITsz-TMG / Awesome-Large-Multimodal-Reasoning-Models
The development and future prospects of multimodal reasoning models.
☆88Updated this week
Alternatives and similar repositories for Awesome-Large-Multimodal-Reasoning-Models
Users that are interested in Awesome-Large-Multimodal-Reasoning-Models are comparing it to the libraries listed below
Sorting:
- Collect every awesome work about r1!☆358Updated last week
- MMR1: Advancing the Frontiers of Multimodal Reasoning☆158Updated last month
- Awesome-RAG-Vision: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision☆147Updated last week
- ☆95Updated last month
- Awesome Reasoning in MLLMs: Papers and Projects about learning to reason with MLLMs, including Chain-of-Thought (CoT), OpenAl o1, and Dee…☆52Updated last month
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆153Updated last month
- ☆76Updated last month
- ✨✨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning☆109Updated this week
- Latest Advances on Long Chain-of-Thought Reasoning☆289Updated last month
- Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆53Updated last month
- ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations☆193Updated last month
- A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning☆183Updated this week
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆121Updated last month
- Reading List of Memory Augmented Multimodal Research, including multimodal context modeling, memory in vision and robotics, and external …☆28Updated 8 months ago
- ☆173Updated last month
- Awesome Agent Training☆106Updated this week
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆232Updated 2 months ago
- ☆173Updated 3 months ago
- The Next Step Forward in Multimodal LLM Alignment☆153Updated last week
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆100Updated 2 months ago
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆111Updated 3 weeks ago
- R1-onevision, a visual language model capable of deep CoT reasoning.☆514Updated last month
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆133Updated 6 months ago
- FlexRAG: A RAG Framework for Information Retrieval and Generation.☆165Updated last week
- Collections of Papers and Projects for Multimodal Reasoning.☆104Updated 2 weeks ago
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models☆125Updated last week
- Deep Reasoning Translation via Reinforcement Learning (arXiv preprint 2025); DRT: Deep Reasoning Translation via Long Chain-of-Thought (a…☆219Updated 2 weeks ago
- Customize your arXiv recommendation every day.☆101Updated last month
- Explore the Multimodal “Aha Moment” on 2B Model☆585Updated last month
- ☆42Updated 2 months ago