zhaochen0110 / Awesome_Think_With_ImagesView external linksLinks
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
☆1,329Feb 3, 2026Updated last week
Alternatives and similar repositories for Awesome_Think_With_Images
Users that are interested in Awesome_Think_With_Images are comparing it to the libraries listed below
Sorting:
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…☆1,349Dec 7, 2025Updated 2 months ago
- ☆1,122Nov 20, 2025Updated 2 months ago
- OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.☆351Jun 1, 2025Updated 8 months ago
- ☆61Dec 5, 2025Updated 2 months ago
- EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL☆4,572Jan 29, 2026Updated 2 weeks ago
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey☆956Nov 14, 2025Updated 3 months ago
- Solve Visual Understanding with Reinforced VLMs☆5,833Oct 21, 2025Updated 3 months ago
- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks☆3,816Updated this week
- A fork to add multimodal model training to open-r1☆1,449Feb 8, 2025Updated last year
- Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’☆2,316Oct 29, 2025Updated 3 months ago
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]☆276Nov 6, 2025Updated 3 months ago
- Witness the aha moment of VLM with less than $3.☆4,029May 19, 2025Updated 8 months ago
- Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grain…☆110Aug 21, 2025Updated 5 months ago
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGI☆251Oct 17, 2025Updated 3 months ago
- MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka☆322Jun 21, 2025Updated 7 months ago
- Latest Advances on Multimodal Large Language Models☆17,337Feb 7, 2026Updated last week
- ☆4,552Sep 14, 2025Updated 5 months ago
- [ICLR2026] This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that…☆760Jan 26, 2026Updated 2 weeks ago
- Doodling our way to AGI ✏️ 🖼️ 🧠☆122May 29, 2025Updated 8 months ago
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆424Dec 22, 2024Updated last year
- Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"☆401Jan 29, 2026Updated 2 weeks ago
- One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks☆3,635Updated this week
- OpenThinkIMG is an end-to-end open-source framework that empowers Large Vision-Language Models to think with images.☆116Jul 11, 2025Updated 7 months ago
- MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning☆768Sep 7, 2025Updated 5 months ago
- [ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.☆1,876Jan 8, 2026Updated last month
- Explore the Multimodal “Aha Moment” on 2B Model☆623Mar 18, 2025Updated 10 months ago
- Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.☆840May 14, 2025Updated 9 months ago
- [NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆104Sep 18, 2025Updated 4 months ago
- Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"☆113Feb 4, 2026Updated last week
- 📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.☆798Oct 10, 2025Updated 4 months ago
- ✨✨ [ICLR 2026] Think Beyond Images☆578Sep 23, 2025Updated 4 months ago
- Open-source unified multimodal model☆5,654Oct 27, 2025Updated 3 months ago
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models☆277Aug 5, 2025Updated 6 months ago
- [NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning☆101Sep 19, 2025Updated 4 months ago
- Official repository for VisionZip (CVPR 2025)☆405Jul 21, 2025Updated 6 months ago
- Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"☆597Jan 17, 2026Updated 3 weeks ago
- Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]☆816Dec 14, 2025Updated 2 months ago
- A paper list of some recent works about Token Compress for Vit and VLM☆828Updated this week
- 📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).☆979Sep 27, 2025Updated 4 months ago