Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"
☆190Jan 16, 2026Updated 5 months ago
Alternatives and similar repositories for GRIT
Users that are interested in GRIT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]☆299Jun 4, 2026Updated 2 weeks ago
- [ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology☆91Jan 26, 2026Updated 4 months ago
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment☆28Jun 11, 2025Updated last year
- [TPAMI 2024] The official implementation of "Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clu…☆12Mar 19, 2024Updated 2 years ago
- [S&P'24] Test-Time Poisoning Attacks Against Test-Time Adaptation Models☆20Feb 18, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆33Oct 12, 2024Updated last year
- [CVPR 2025 Highlight] Official Pytorch codebase for paper: "Assessing and Learning Alignment of Unimodal Vision and Language Models"☆60Aug 15, 2025Updated 10 months ago
- ☆14Jul 8, 2023Updated 2 years ago
- [ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆89May 20, 2025Updated last year
- Code for "PeftCD: Leveraging Vision Foundation Models with Parameter-Efficient Fine-Tuning for Remote Sensing Change Detection"☆32Apr 21, 2026Updated last month
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆90Nov 20, 2025Updated 6 months ago
- PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…☆48Mar 2, 2026Updated 3 months ago
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding☆214Oct 15, 2025Updated 8 months ago
- Make Large Multimodal Models excel in object detection, ICCV 2025☆65Aug 1, 2025Updated 10 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆31Feb 10, 2025Updated last year
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆152May 25, 2026Updated 3 weeks ago
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆445Dec 22, 2024Updated last year
- ☆174Apr 12, 2026Updated 2 months ago
- Official repository for the ACL 2025 Findings paper "Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal M…☆26May 12, 2026Updated last month
- ☆14Jun 6, 2023Updated 3 years ago
- Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"☆47Feb 26, 2026Updated 3 months ago
- Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"☆29Apr 27, 2024Updated 2 years ago
- Enhancing Ultrahigh Resolution Remote Sensing Imagery Analysis With ImageRAG [GRSM]☆31May 16, 2026Updated last month
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- 🧠 VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning (ICLR 2026)☆340Feb 8, 2026Updated 4 months ago
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models☆112Oct 10, 2024Updated last year
- ☆42Jun 9, 2025Updated last year
- Official code of paper "GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis" [ICCV 2025]☆48Jun 29, 2025Updated 11 months ago
- [EMNLP 2025 Main] Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.☆28Nov 18, 2025Updated 7 months ago
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)☆19Jul 1, 2025Updated 11 months ago
- 🔥 [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospe…☆57Jan 22, 2026Updated 4 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆65Jul 22, 2025Updated 10 months ago
- [IGARSS 2024] Code for "CLIP-Guided Source-Free Object Detection in Aerial Images"☆28Dec 2, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- [CVPR 2025] VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning☆13Jun 7, 2025Updated last year
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆377Apr 20, 2025Updated last year
- ☆16Jul 23, 2024Updated last year
- Multimodal RewardBench☆68Feb 21, 2025Updated last year
- [NeurIPS'24] I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing☆33Dec 9, 2025Updated 6 months ago
- [CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos☆104Apr 14, 2025Updated last year
- ☆19Mar 3, 2025Updated last year