HJYao00 / Awesome-Reasoning-MLLM
Awesome Reasoning in MLLMs: Papers and Projects about learning to reason with MLLMs, including Chain-of-Thought (CoT), OpenAl o1, and DeepSeek-R1
☆50Updated last month
Alternatives and similar repositories for Awesome-Reasoning-MLLM:
Users that are interested in Awesome-Reasoning-MLLM are comparing it to the libraries listed below
- Collection of papers and repos for multimodal chain-of-thought☆80Updated 5 months ago
- ☆103Updated 2 weeks ago
- A Survey on Benchmarks of Multimodal Large Language Models☆100Updated last month
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆152Updated last month
- Awesome-RAG-VIsion: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision☆136Updated last week
- ☆35Updated 2 weeks ago
- SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆92Updated this week
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆76Updated 2 months ago
- ☆73Updated 3 months ago
- MMR1: Advancing the Frontiers of Multimodal Reasoning☆155Updated last month
- ☆40Updated 3 months ago
- ☆40Updated 2 weeks ago
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆100Updated 3 weeks ago
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆56Updated last month
- Paper collections of multi-modal LLM for Math/STEM/Code.☆88Updated this week
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆100Updated last month
- The Next Step Forward in Multimodal LLM Alignment☆145Updated last month
- ☆84Updated 2 weeks ago
- Official repository of MMDU dataset☆89Updated 6 months ago
- This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!☆35Updated last month
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆108Updated this week
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆53Updated 9 months ago
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …☆39Updated 6 months ago
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆184Updated 3 weeks ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆132Updated 5 months ago
- SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models☆215Updated 7 months ago
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆33Updated 2 weeks ago
- Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆66Updated last month
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆86Updated last year
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆57Updated 4 months ago