UCSB-AI/GRIT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/UCSB-AI/GRIT)

UCSB-AI / GRIT

Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"

☆190

Alternatives and similar repositories for GRIT

Users that are interested in GRIT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TIGER-AI-Lab / Pixel-Reasoner
View on GitHub
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
☆301Jun 4, 2026Updated last month
Haochen-Wang409 / TreeVGR
View on GitHub
[ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
☆91Jan 26, 2026Updated 6 months ago
Visual-Agent / DeepEyes
View on GitHub
☆1,253Nov 20, 2025Updated 8 months ago
zhaochen0110 / Awesome_Think_With_Images
View on GitHub
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…
☆1,499Mar 9, 2026Updated 4 months ago
Gabesarch / grounded-rl
View on GitHub
☆133Jul 22, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
AntResearchNLP / ViLaSR
View on GitHub
[NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
☆98Jul 27, 2025Updated last year
jungao1106 / ICoT
View on GitHub
[CVPR' 25] Interleaved-Modal Chain-of-Thought
☆112Dec 30, 2025Updated 6 months ago
saccharomycetes / mllms_know
View on GitHub
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
☆381Apr 20, 2025Updated last year
PKU-YuanGroup / Look-Back
View on GitHub
This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".
☆100Jul 10, 2025Updated last year
VTool-R1 / VTool-R1
View on GitHub
[ICLR 2026] "VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use"
☆200Mar 20, 2026Updated 4 months ago
AMAP-ML / UniVG-R1
View on GitHub
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
☆167Jun 2, 2025Updated last year
thunlp / Migician
View on GitHub
[ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
☆90May 20, 2025Updated last year
deepcs233 / Visual-CoT
View on GitHub
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …
☆447Dec 22, 2024Updated last year
yfzhang114 / Thyme
View on GitHub
✨✨ [ICLR 2026] Think Beyond Images
☆584Sep 23, 2025Updated 10 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
marinero4972 / Open-o3-Video
View on GitHub
[ICML 2026] Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆158May 1, 2026Updated 2 months ago
zzzhhzzz / Ground-R1
View on GitHub
☆46Jul 14, 2025Updated last year
HJYao00 / R1-ShareVL
View on GitHub
[NeurIPS 2025] Reasoning MLLM, Share-GRPO, advantage vanishing, sparse reward
☆38Sep 19, 2025Updated 10 months ago
mbzuai-oryx / VideoGLaMM
View on GitHub
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
☆104Apr 14, 2025Updated last year
zhaochen0110 / OpenThinkIMG
View on GitHub
OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.
☆399Jun 1, 2025Updated last year
yeliudev / VideoMind
View on GitHub
🧠 VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning (ICLR 2026)
☆350Feb 8, 2026Updated 5 months ago
IDEA-Research / ChatRex
View on GitHub
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
☆216Oct 15, 2025Updated 9 months ago
xtong-zhang / Chain-of-Focus
View on GitHub
☆71Dec 5, 2025Updated 7 months ago
ShawnTan86 / TokenCarve
View on GitHub
This is the open-source code for TokenCarve.
☆25Jan 23, 2026Updated 6 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
maifoundations / GCoT
View on GitHub
Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation
☆15Aug 11, 2025Updated 11 months ago
yix8 / VisualPlanning
View on GitHub
[ICLR 2026 Oral] Visual Planning: Let's Think Only with Images
☆368Apr 24, 2026Updated 3 months ago
uclanlp / OpenVLThinker
View on GitHub
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
☆155May 25, 2026Updated 2 months ago
Liuziyu77 / Visual-RFT
View on GitHub
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
☆2,262Oct 29, 2025Updated 9 months ago
zhang9302002 / ThinkingWithVideos
View on GitHub
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
☆102Oct 15, 2025Updated 9 months ago
Mini-o3 / Mini-o3
View on GitHub
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
☆423Jan 29, 2026Updated 6 months ago
jun297 / v1
View on GitHub
v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning
☆21Jul 20, 2026Updated last week
MikeWangWZHL / PAPO
View on GitHub
Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"
☆153Feb 4, 2026Updated 5 months ago
meetdavidwan / crg
View on GitHub
PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"
☆39Mar 4, 2024Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
real-absolute-AI / NoisyRollout
View on GitHub
[NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
☆112Sep 18, 2025Updated 10 months ago
hshjerry / VideoEspresso
View on GitHub
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆140Jul 28, 2025Updated last year
maifoundations / Visionary-R1
View on GitHub
Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning
☆44Jul 2, 2025Updated last year
kesenzhao / UV-CoT
View on GitHub
☆45Jul 28, 2025Updated last year
linkangheng / PR1
View on GitHub
[NeurIPS 2025] Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning
☆289Jul 15, 2025Updated last year
xinyan-cxy / MINT-CoT
View on GitHub
[NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
☆107Sep 19, 2025Updated 10 months ago
OpenGVLab / VideoChat-R1
View on GitHub
[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
☆268Oct 18, 2025Updated 9 months ago