This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!
☆55Mar 21, 2025Updated 11 months ago
Alternatives and similar repositories for Mind_with_eyes_Awesome_MLLMs_Reasoning
Users that are interested in Mind_with_eyes_Awesome_MLLMs_Reasoning are comparing it to the libraries listed below
Sorting:
- [CVPR 2025 Highlight] Official repository for CoMM Dataset☆51Dec 31, 2024Updated last year
- ☆12Jul 16, 2025Updated 7 months ago
- ☆14Feb 24, 2025Updated last year
- Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward☆60Nov 27, 2025Updated 3 months ago
- ☆11Updated this week
- ☆19May 19, 2024Updated last year
- [CVPR' 25] Interleaved-Modal Chain-of-Thought☆106Dec 30, 2025Updated 2 months ago
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆136Aug 5, 2025Updated 7 months ago
- This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".☆84Jul 10, 2025Updated 8 months ago
- [ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆195Mar 17, 2025Updated 11 months ago
- ☆19Dec 6, 2023Updated 2 years ago
- Train deepseek r1-like reasoning LLM with ease | 轻松训练1个deepseek r1类的推理LLM☆18Feb 15, 2025Updated last year
- MCP DeepResearch Server: 基于 LangGraph + Ollama + Tavily 的深度研究服务器,支持异步运行、超时控制与进度推送☆31Jun 16, 2025Updated 8 months ago
- 本项目借助飞桨平台,构建起一套创新的多模型协同系统,实现 PDF 文件到 Markdown 文件的高效、精准转换。☆27Mar 25, 2025Updated 11 months ago
- Rough LLM Interpreter of ComfyUI☆28Jan 23, 2025Updated last year
- R1-Vision: Let's first take a look at the image☆48Feb 16, 2025Updated last year
- Exploring through 7 popular datasets for visual object tracking, including OTB, UAV, VOT, LaSOT, NFS, TrackingNet and GOT-10k.☆25Feb 1, 2020Updated 6 years ago
- (ArXiv25) Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning☆58Sep 30, 2025Updated 5 months ago
- Collections of Papers and Projects for Multimodal Reasoning.☆107Apr 25, 2025Updated 10 months ago
- Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)☆32May 15, 2023Updated 2 years ago
- AutoCoA (Automatic generation of Chain-of-Action) is an agent model framework that enhances the multi-turn tool usage capability of reaso…☆130Mar 18, 2025Updated 11 months ago
- Difyで作る生成AIアプリ完全入門☆17May 25, 2025Updated 9 months ago
- A library of visualization tools for the interpretability and hallucination analysis of large vision-language models (LVLMs).☆41May 22, 2025Updated 9 months ago
- A simple WeChat Official Account layout tool based on Dify☆17Jun 27, 2025Updated 8 months ago
- ☆26Feb 28, 2026Updated last week
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]☆45Jul 22, 2025Updated 7 months ago
- R1-like Computer-use Agent☆89Mar 21, 2025Updated 11 months ago
- Write the database metadata into the dify knowledge☆12Dec 30, 2025Updated 2 months ago
- ☆11Aug 29, 2025Updated 6 months ago
- Our survey's paper list on Agentic AI, continuously updated with the latest research.☆90Oct 28, 2025Updated 4 months ago
- A full-stack AI-powered business intelligence tool for non-experts, featuring serverless backend processing and a secure Streamlit fronte…☆28Feb 13, 2026Updated 3 weeks ago
- Official codebase for the paper "Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space"☆65Dec 17, 2025Updated 2 months ago
- ☆28Dec 4, 2025Updated 3 months ago
- Workflow automation, but you just describe what you want and it happens.☆27Nov 22, 2025Updated 3 months ago
- [NeurIPS 2025] More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models☆75May 31, 2025Updated 9 months ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆156Dec 24, 2024Updated last year
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆85Mar 21, 2024Updated last year
- ☆12Jun 28, 2024Updated last year
- GUIEvalKit: Open-source Evaluation Toolkit for GUI Agents☆19Feb 26, 2026Updated last week