eric-ai-lab/GRIT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/eric-ai-lab/GRIT)

eric-ai-lab / GRIT

Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"

☆187

Alternatives and similar repositories for GRIT

Users that are interested in GRIT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TIGER-AI-Lab / Pixel-Reasoner
View on GitHub
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
☆295Nov 6, 2025Updated 6 months ago
Haochen-Wang409 / TreeVGR
View on GitHub
[ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
☆90Jan 26, 2026Updated 3 months ago
itsvaibhav01 / Immune
View on GitHub
[CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
☆28Jun 11, 2025Updated 11 months ago
Gorilla-Lab-SCUT / TTAC2
View on GitHub
[TPAMI 2024] The official implementation of "Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clu…
☆12Mar 19, 2024Updated 2 years ago
tianshuocong / TePA
View on GitHub
[S&P'24] Test-Time Poisoning Attacks Against Test-Time Adaptation Models
☆20Feb 18, 2025Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
XMUDeepLIT / AVG-LLaVA
View on GitHub
Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"
☆33Oct 12, 2024Updated last year
lezhang7 / SAIL
View on GitHub
[CVPR 2025 Highlight] Official Pytorch codebase for paper: "Assessing and Learning Alignment of Unimodal Vision and Language Models"
☆60Aug 15, 2025Updated 9 months ago
chinmay5 / vesselformer
View on GitHub
☆14Jul 8, 2023Updated 2 years ago
thunlp / Migician
View on GitHub
[ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
☆89May 20, 2025Updated last year
zhangquanchen / SIFThinker
View on GitHub
[AAAI 2026] SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
☆22Dec 2, 2025Updated 5 months ago
dyzy41 / PeftCD
View on GitHub
Code for "PeftCD: Leveraging Vision Foundation Models with Parameter-Efficient Fine-Tuning for Remote Sensing Change Detection"
☆28Apr 21, 2026Updated last month
om-ai-lab / ZoomEye
View on GitHub
[EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
☆80Nov 20, 2025Updated 6 months ago
DreamMr / HR-Bench
View on GitHub
PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…
☆49Mar 2, 2026Updated 2 months ago
IDEA-Research / ChatRex
View on GitHub
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
☆213Oct 15, 2025Updated 7 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
360CVGroup / LMM-Det
View on GitHub
Make Large Multimodal Models excel in object detection, ICCV 2025
☆65Aug 1, 2025Updated 9 months ago
GuangyanS / Sys2-LLaVA
View on GitHub
☆31Feb 10, 2025Updated last year
uclanlp / OpenVLThinker
View on GitHub
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
☆149Apr 15, 2026Updated last month
Hungryyan1 / UniCorn
View on GitHub
☆147Apr 12, 2026Updated last month
deepcs233 / Visual-CoT
View on GitHub
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …
☆443Dec 22, 2024Updated last year
UCSB-AI / ProbMed
View on GitHub
Official repository for the ACL 2025 Findings paper "Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal M…
☆26May 12, 2026Updated last week
inspire-group / tta_risk
View on GitHub
☆14Jun 6, 2023Updated 2 years ago
Gabesarch / grounded-rl
View on GitHub
☆124Jul 22, 2025Updated 10 months ago
m1k2zoo / negbench
View on GitHub
Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"
☆46Feb 26, 2026Updated 2 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
eric-ai-lab / Discffusion
View on GitHub
Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"
☆29Apr 27, 2024Updated 2 years ago
om-ai-lab / ImageRAG
View on GitHub
Enhancing Ultrahigh Resolution Remote Sensing Imagery Analysis With ImageRAG [GRSM]
☆32May 16, 2026Updated last week
yeliudev / VideoMind
View on GitHub
🧠 VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning (ICLR 2026)
☆335Feb 8, 2026Updated 3 months ago
yu-rp / apiprompting
View on GitHub
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
☆111Oct 10, 2024Updated last year
SJTU-DENG-Lab / R1-Zero-VSI
View on GitHub
☆42Jun 9, 2025Updated 11 months ago
Awenbocc / GEMeX-Project
View on GitHub
Official code of paper "GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis" [ICCV 2025]
☆46Jun 29, 2025Updated 10 months ago
CASIA-IVA-Lab / VRoPE
View on GitHub
[EMNLP 2025 Main] Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.
☆27Nov 18, 2025Updated 6 months ago
gyhdog99 / RACRO2
View on GitHub
Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)
☆19Jul 1, 2025Updated 10 months ago
tsunghan-wu / reverse_vlm
View on GitHub
🔥 [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospe…
☆57Jan 22, 2026Updated 4 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
OpenGVLab / TPO
View on GitHub
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆65Jul 22, 2025Updated 10 months ago
Lans1ng / SFOD-RS
View on GitHub
[IGARSS 2024] Code for "CLIP-Guided Source-Free Object Detection in Aerial Images"
☆27Dec 2, 2024Updated last year
PlusLabNLP / VISCO
View on GitHub
[CVPR 2025] VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
☆13Jun 7, 2025Updated 11 months ago
saccharomycetes / mllms_know
View on GitHub
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
☆373Apr 20, 2025Updated last year
facebookresearch / multimodal_rewardbench
View on GitHub
Multimodal RewardBench
☆68Feb 21, 2025Updated last year
mbzuai-oryx / VideoGLaMM
View on GitHub
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
☆101Apr 14, 2025Updated last year
cocoshe / I2EBench
View on GitHub
[NeurIPS'24] I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing
☆33Dec 9, 2025Updated 5 months ago