Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"
☆181Jan 16, 2026Updated 2 months ago
Alternatives and similar repositories for GRIT
Users that are interested in GRIT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]☆282Nov 6, 2025Updated 4 months ago
- [ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology☆78Jan 26, 2026Updated last month
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment☆27Jun 11, 2025Updated 9 months ago
- [TPAMI 2024] The official implementation of "Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clu…☆11Mar 19, 2024Updated 2 years ago
- ☆52Jan 13, 2026Updated 2 months ago
- [S&P'24] Test-Time Poisoning Attacks Against Test-Time Adaptation Models☆19Feb 18, 2025Updated last year
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆33Oct 12, 2024Updated last year
- ☆14Jul 8, 2023Updated 2 years ago
- [ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆89May 20, 2025Updated 10 months ago
- [AAAI 2026] SIFThinker: Spatially-Aware Image Focus for Visual Reasoning☆23Dec 2, 2025Updated 3 months ago
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆77Nov 20, 2025Updated 4 months ago
- PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…☆43Mar 2, 2026Updated 3 weeks ago
- [ACL 2025 Findings] "Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"☆25Feb 21, 2025Updated last year
- ☆28Feb 10, 2025Updated last year
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding☆212Oct 15, 2025Updated 5 months ago
- Make Large Multimodal Models excel in object detection, ICCV 2025☆63Aug 1, 2025Updated 7 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆131Jul 24, 2025Updated 8 months ago
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆439Dec 22, 2024Updated last year
- Enhancing Ultrahigh Resolution Remote Sensing Imagery Analysis With ImageRAG [GRSM]☆29Feb 4, 2026Updated last month
- ☆14Jun 6, 2023Updated 2 years ago
- ☆121Jul 22, 2025Updated 8 months ago
- Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"☆46Feb 26, 2026Updated 3 weeks ago
- Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"☆29Apr 27, 2024Updated last year
- ☆41Jun 9, 2025Updated 9 months ago
- 🧠 VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning (ICLR 2026)☆311Feb 8, 2026Updated last month
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models☆111Oct 10, 2024Updated last year
- Official code of paper "GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis" [ICCV 2025]☆43Jun 29, 2025Updated 8 months ago
- [EMNLP 2025 Main] Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.☆27Nov 18, 2025Updated 4 months ago
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)☆19Jul 1, 2025Updated 8 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆64Jul 22, 2025Updated 8 months ago
- 🔥 [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospe…☆56Jan 22, 2026Updated 2 months ago
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆353Apr 20, 2025Updated 11 months ago
- [IGARSS 2024] Code for "CLIP-Guided Source-Free Object Detection in Aerial Images"☆28Dec 2, 2024Updated last year
- [CVPR 2025] VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning☆13Jun 7, 2025Updated 9 months ago
- ☆16Jul 23, 2024Updated last year
- Multimodal RewardBench☆64Feb 21, 2025Updated last year
- [CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos☆99Apr 14, 2025Updated 11 months ago
- [NeurIPS'24] I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing☆33Dec 9, 2025Updated 3 months ago
- [AAAI 2024] Towards Real-World Test-Time Adaptation: Tri-Net Self-Training with Balanced Normalization☆27Apr 8, 2025Updated 11 months ago