β68Dec 5, 2025Updated 5 months ago
Alternatives and similar repositories for Chain-of-Focus
Users that are interested in Chain-of-Focus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- β13Nov 5, 2024Updated last year
- MAT: Multi-modal Agent Tuning π₯ ICLR 2025 (Spotlight)β94Dec 18, 2025Updated 5 months ago
- Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual inβ¦β1,466Mar 9, 2026Updated 2 months ago
- γICLR 2025 π₯γMMKE-Bench, a challenging benchmark for evaluating diverse semantic editing in real-world scenarios.β23Apr 19, 2025Updated last year
- Official implementation of paper "HiAE: A High-Throughput Authenticated Encryption Algorithm for Cross-Platfor Efficiency"β19Nov 11, 2025Updated 6 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Official code of paper "GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis" [ICCV 2025]β46Jun 29, 2025Updated 11 months ago
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]β295Nov 6, 2025Updated 6 months ago
- [AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Modelsβ42Jan 27, 2026Updated 4 months ago
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Explorationβ80Nov 20, 2025Updated 6 months ago
- [ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodologyβ90Jan 26, 2026Updated 4 months ago
- RadGraph: Extracting Clinical Entities and Relations from Radiology Reportsβ14Nov 22, 2022Updated 3 years ago
- The official implementation of "Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Maβ¦β12Sep 13, 2024Updated last year
- Interpreting Chest X-rays Like a Radiologist: A Benchmark with Clinical Reasoning, release the dataset and the model weightβ13May 26, 2025Updated last year
- β13Sep 14, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ACM MM 2025 π₯π₯ ] MIRA: A first-of-its-kind medical RAG framework that fuses image features and retrieved knowledge with dynamic contexβ¦β23Aug 28, 2025Updated 9 months ago
- β55Apr 4, 2026Updated last month
- CVPR2026β32Sep 18, 2025Updated 8 months ago
- [NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoningβ106Sep 19, 2025Updated 8 months ago
- β25Apr 20, 2026Updated last month
- [ π― NeurIPS 2025 ] 3D-RAD π©»: A Comprehensive 3D Radiology Med-VQA Dataset with Multi-Temporal Analysis and Diverse Diagnostic Tasksβ31Oct 28, 2025Updated 7 months ago
- ICLR 2026: Agent-X Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasksβ42Apr 28, 2026Updated last month
- [AAAI 2026]Release of code, datasets and model for our work TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalβ¦β111Dec 1, 2025Updated 5 months ago
- CVPR25β28Jul 2, 2025Updated 10 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Rui Qian, Xin Yin, Chuanhang Deng, et al.: UGround: Towards Unified Visual Grounding with Unrolled Transformers (ICML 2026)β24May 8, 2026Updated 3 weeks ago
- β24Nov 27, 2025Updated 6 months ago
- [ACCV2024 (Oral)] Official pytorch implementation of X-RGenβ18Jan 20, 2025Updated last year
- β19Jul 22, 2025Updated 10 months ago
- The dataset and evaluation code for MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundβ¦β25Feb 19, 2026Updated 3 months ago
- [ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Modelsβ50Jul 7, 2025Updated 10 months ago
- MemoryEQAβ25May 4, 2026Updated 3 weeks ago
- The code and weight for LoVA. LoVA is a novel model for Long-form Video-to-Audio generation. Based on the Diffusion Transformer (DiT) arcβ¦β15Feb 27, 2025Updated last year
- Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"β420Jan 29, 2026Updated 4 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits β’ AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- code of the CVPR 2020 paper "Learning to Optimize on SPD Manifolds"β13Sep 12, 2020Updated 5 years ago
- DeepTumorVQA benchmark for VLMs and Agents (10k testing samples)β35May 19, 2026Updated last week
- β1,215Nov 20, 2025Updated 6 months ago
- [NAACL 2025] VividMed: Vision Language Model with Versatile Visual Grounding for Medicineβ30Mar 10, 2025Updated last year
- Awesome autoregressive vision foundation modelsβ26Dec 24, 2024Updated last year
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-trainingβ109Jul 18, 2025Updated 10 months ago
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoningβ24Sep 9, 2024Updated last year