Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"
☆185Jan 16, 2026Updated 3 months ago
Alternatives and similar repositories for GRIT
Users that are interested in GRIT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]☆293Nov 6, 2025Updated 5 months ago
- [ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology☆86Jan 26, 2026Updated 3 months ago
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment☆28Jun 11, 2025Updated 10 months ago
- [TPAMI 2024] The official implementation of "Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clu…☆12Mar 19, 2024Updated 2 years ago
- [S&P'24] Test-Time Poisoning Attacks Against Test-Time Adaptation Models☆19Feb 18, 2025Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆33Oct 12, 2024Updated last year
- [CVPR 2025 Highlight] Official Pytorch codebase for paper: "Assessing and Learning Alignment of Unimodal Vision and Language Models"☆60Aug 15, 2025Updated 8 months ago
- ☆14Jul 8, 2023Updated 2 years ago
- [ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆89May 20, 2025Updated 11 months ago
- Code for "PeftCD: Leveraging Vision Foundation Models with Parameter-Efficient Fine-Tuning for Remote Sensing Change Detection"☆27Apr 21, 2026Updated 2 weeks ago
- [AAAI 2026] SIFThinker: Spatially-Aware Image Focus for Visual Reasoning☆22Dec 2, 2025Updated 5 months ago
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆80Nov 20, 2025Updated 5 months ago
- PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…☆48Mar 2, 2026Updated 2 months ago
- ☆29Feb 10, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding☆212Oct 15, 2025Updated 6 months ago
- Make Large Multimodal Models excel in object detection, ICCV 2025☆64Aug 1, 2025Updated 9 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆147Apr 15, 2026Updated 2 weeks ago
- ☆155Apr 12, 2026Updated 3 weeks ago
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆443Dec 22, 2024Updated last year
- Official repository for the ACL 2025 Findings paper "Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal M…☆26Feb 21, 2025Updated last year
- ☆14Jun 6, 2023Updated 2 years ago
- Enhancing Ultrahigh Resolution Remote Sensing Imagery Analysis With ImageRAG [GRSM]☆31Feb 4, 2026Updated 3 months ago
- ☆123Jul 22, 2025Updated 9 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"☆45Feb 26, 2026Updated 2 months ago
- Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"☆29Apr 27, 2024Updated 2 years ago
- 🧠 VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning (ICLR 2026)☆326Feb 8, 2026Updated 2 months ago
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models☆111Oct 10, 2024Updated last year
- ☆42Jun 9, 2025Updated 10 months ago
- Official code of paper "GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis" [ICCV 2025]☆45Jun 29, 2025Updated 10 months ago
- [EMNLP 2025 Main] Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.☆27Nov 18, 2025Updated 5 months ago
- Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"☆29Jul 31, 2024Updated last year
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)☆19Jul 1, 2025Updated 10 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- 🔥 [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospe…☆57Jan 22, 2026Updated 3 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆65Jul 22, 2025Updated 9 months ago
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆368Apr 20, 2025Updated last year
- [IGARSS 2024] Code for "CLIP-Guided Source-Free Object Detection in Aerial Images"☆28Dec 2, 2024Updated last year
- [CVPR 2025] VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning☆13Jun 7, 2025Updated 10 months ago
- ☆16Jul 23, 2024Updated last year
- Multimodal RewardBench☆68Feb 21, 2025Updated last year