yix8 / VisualPlanningLinks

Visual Planning: Let's Think Only with Images

☆283

Alternatives and similar repositories for VisualPlanning

Users that are interested in VisualPlanning are comparing it to the libraries listed below

Sorting:

TIGER-AI-Lab / Pixel-Reasoner
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
☆254Updated last month
yihedeng9 / OpenVLThinker
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
☆122Updated 4 months ago
dongyh20 / Insight-V
[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
☆229Updated last month
ML-GSAI / LLaDA-V
☆293Updated last month
Mini-o3 / Mini-o3
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
☆372Updated 2 months ago
jacklishufan / LaViDa
Official Implementation of LaViDa: :A Large Diffusion Language Model for Multimodal Understanding
☆178Updated this week
UMass-Embodied-AGI / Mirage
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)
☆197Updated 4 months ago
aim-uofa / Active-o3
ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
☆75Updated 3 weeks ago
TIGER-AI-Lab / VL-Rethinker
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]
☆168Updated 6 months ago
Gabesarch / grounded-rl
☆107Updated 4 months ago
Yxxxb / VoCo-LLaMA
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆195Updated 5 months ago
Wakals / CoVT
☆112Updated this week
mbzuai-oryx / LlamaV-o1
[ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs
☆310Updated 6 months ago
zhaochen0110 / OpenThinkIMG
OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.
☆329Updated 6 months ago
AntResearchNLP / ViLaSR
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
☆82Updated 4 months ago
Open-Reasoner-Zero / Open-Vision-Reasoner
[NeurIPS 2025] The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reason…
☆144Updated 2 months ago
cambrian-mllm / cambrian-s
Cambrian-S: Towards Spatial Supersensing in Video
☆407Updated 3 weeks ago
VisuLogic-Benchmark / VisuLogic-Eval
☆31Updated 3 months ago
bronyayang / Law_of_Vision_Representation_in_MLLMs
[COLM'25] Official implementation of the Law of Vision Representation in MLLMs
☆171Updated 2 months ago
ls-kelvin / REVPT
Code for paper: Reinforced Vision Perception with Tools
☆62Updated 2 months ago
facebookresearch / webssl
Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).
☆189Updated 7 months ago
Haochen-Wang409 / TreeVGR
Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"
☆71Updated last month
weijiawu / Awesome-Visual-Reinforcement-Learning
📖 This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.
☆347Updated last week
yu-rp / VisualPerceptionToken
☆130Updated 8 months ago
RifleZhang / LLaVA-Reasoner-DPO
☆103Updated 11 months ago
multimodal-reasoning-lab / Bagel-Zebra-CoT
https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT
☆104Updated last month
TencentARC / SEED-Bench-R1
☆95Updated 5 months ago
NVlabs / Long-RL
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆669Updated 2 months ago
callsys / GMPO
Geometric-Mean Policy Optimization
☆95Updated 3 weeks ago
FreedomIntelligence / LongLLaVA
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
☆211Updated 11 months ago