☆63Dec 5, 2025Updated 3 months ago
Alternatives and similar repositories for Chain-of-Focus
Users that are interested in Chain-of-Focus are comparing it to the libraries listed below
Sorting:
- Official code of paper "GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis" [ICCV 2025]☆42Jun 29, 2025Updated 8 months ago
- The official implementation of "Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Ma…☆13Sep 13, 2024Updated last year
- [ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology☆76Jan 26, 2026Updated last month
- Official implementation of paper "HiAE: A High-Throughput Authenticated Encryption Algorithm for Cross-Platfor Efficiency"☆19Nov 11, 2025Updated 3 months ago
- Interpreting Chest X-rays Like a Radiologist: A Benchmark with Clinical Reasoning, release the dataset and the model weight☆13May 26, 2025Updated 9 months ago
- CVPR2026☆25Sep 18, 2025Updated 5 months ago
- [NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning☆102Sep 19, 2025Updated 5 months ago
- RadGraph: Extracting Clinical Entities and Relations from Radiology Reports☆13Nov 22, 2022Updated 3 years ago
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]☆279Nov 6, 2025Updated 4 months ago
- [ 🎯 NeurIPS 2025 ] 3D-RAD 🩻: A Comprehensive 3D Radiology Med-VQA Dataset with Multi-Temporal Analysis and Diverse Diagnostic Tasks☆27Oct 28, 2025Updated 4 months ago
- MAT: Multi-modal Agent Tuning 🔥 ICLR 2025 (Spotlight)☆88Dec 18, 2025Updated 2 months ago
- DeepTumorVQA benchmark (9262 CT images + 395k QA pairs)☆30Jul 8, 2025Updated 8 months ago
- ☆21Nov 27, 2025Updated 3 months ago
- ☆25May 12, 2025Updated 9 months ago
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training☆103Jul 18, 2025Updated 7 months ago
- [NAACL 2025] VividMed: Vision Language Model with Versatile Visual Grounding for Medicine☆28Mar 10, 2025Updated last year
- CVPR25☆26Jul 2, 2025Updated 8 months ago
- [ACCV2024 (Oral)] Official pytorch implementation of X-RGen☆19Jan 20, 2025Updated last year
- A real-time swarf detection and analysis system based on YOLO and Qwen-vl-max, providing efficient video stream processing and intelligen…☆40Aug 5, 2025Updated 7 months ago
- M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning☆46Jul 17, 2025Updated 7 months ago
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆24Sep 9, 2024Updated last year
- Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks☆37Nov 27, 2025Updated 3 months ago
- Modality Gap–Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models☆51Feb 23, 2026Updated 2 weeks ago
- Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"☆408Jan 29, 2026Updated last month
- A curated list of vision-and-language pre-training (VLP). :-)☆62Jul 6, 2022Updated 3 years ago
- Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"☆120Feb 4, 2026Updated last month
- ☆132Mar 22, 2025Updated 11 months ago
- ☆1,145Nov 20, 2025Updated 3 months ago
- [CVPR 2026] OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe☆148Feb 23, 2026Updated 2 weeks ago
- [ML4H'25] MedVLThinker: Simple Baselines for Multimodal Medical Reasoning☆52Dec 21, 2025Updated 2 months ago
- ☆36Jan 9, 2026Updated 2 months ago
- 🚀ReVisual-R1 is a 7B open-source multimodal language model that follows a three-stage curriculum—cold-start pre-training, multimodal rei…☆200Dec 10, 2025Updated 3 months ago
- Counterfactual Reasoning VQA Dataset☆28Nov 23, 2023Updated 2 years ago
- OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.☆354Jun 1, 2025Updated 9 months ago
- [NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning☆82Sep 19, 2025Updated 5 months ago
- [TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”☆34Apr 11, 2024Updated last year
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆349Apr 20, 2025Updated 10 months ago
- ☆36Feb 6, 2026Updated last month
- [ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models☆49Jul 7, 2025Updated 8 months ago