[ICLR 2026 Oral] Visual Planning: Let's Think Only with Images
☆304Updated this week
Alternatives and similar repositories for VisualPlanning
Users that are interested in VisualPlanning are comparing it to the libraries listed below
Sorting:
- [CVPR 2025 highlight] Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision☆33Dec 2, 2025Updated 2 months ago
- Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"☆405Jan 29, 2026Updated last month
- Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’☆2,305Oct 29, 2025Updated 4 months ago
- OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.☆354Jun 1, 2025Updated 9 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆128Jul 24, 2025Updated 7 months ago
- Code and updates for the ScoreRS project.☆40Sep 19, 2025Updated 5 months ago
- 3D LiDAR place recognition targeting the heterogeneous robots scenario☆30Feb 9, 2026Updated 3 weeks ago
- [ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation☆57Sep 12, 2025Updated 5 months ago
- Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…☆1,338Feb 3, 2026Updated 3 weeks ago
- ENM-MCL: Efficient Neural Map for Monte Carlo Localization☆42Mar 30, 2025Updated 11 months ago
- MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka☆324Jun 21, 2025Updated 8 months ago
- Paper list for LLM/MLLM-based image segmentation☆47Dec 24, 2025Updated 2 months ago
- [CoRL 2025] CogniPlan: Uncertainty-Guided Path Planning with Conditional Generative Layout Prediction - Public code and model☆44Jan 30, 2026Updated last month
- I'm LPR (LiDAR Place Recognition), even if built upon a Vision Foundation Model.☆61Dec 1, 2025Updated 3 months ago
- Multimodal RewardBench☆62Feb 21, 2025Updated last year
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆33Oct 12, 2024Updated last year
- Solve Visual Understanding with Reinforced VLMs☆5,850Oct 21, 2025Updated 4 months ago
- Certifiable solvers for the relative pose problem (RPp) with known gravity vector☆13Feb 16, 2023Updated 3 years ago
- EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL☆4,649Updated this week
- [ICLR2026] This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that…☆773Jan 26, 2026Updated last month
- rmp data ranking☆13Nov 4, 2025Updated 3 months ago
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆40Jul 13, 2024Updated last year
- ☆32Jan 4, 2026Updated last month
- [ICLR 2025] Official Pytorch Implementation of MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segm…☆24Apr 3, 2025Updated 10 months ago
- Riemannian Optimization for Active Mapping with Robot Teams (ROAM)☆51Oct 7, 2025Updated 4 months ago
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…☆402Aug 26, 2025Updated 6 months ago
- ☆28Sep 2, 2025Updated 6 months ago
- RS-MTDF: Multi-Teacher Distillation and Fusion for Remote Sensing Semi-Supervised Semantic Segmentation☆19Jun 15, 2025Updated 8 months ago
- Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection☆22Feb 5, 2026Updated 3 weeks ago
- TopViewRS: Vision-Language Models as Top-View Spatial Reasoners (EMNLP 2024 Oral)☆15Jun 14, 2025Updated 8 months ago
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models☆278Aug 5, 2025Updated 6 months ago
- [TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆149Oct 10, 2025Updated 4 months ago
- ☆1,129Nov 20, 2025Updated 3 months ago
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆108May 29, 2025Updated 9 months ago
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning☆406Dec 15, 2024Updated last year
- Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think & UnifiedReward-Flex☆723Updated this week
- Learn symforce together :)☆32Aug 23, 2022Updated 3 years ago
- Repository for "Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features Critics" …☆20Jun 16, 2024Updated last year
- MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning☆770Sep 7, 2025Updated 5 months ago