xtong-zhang/Chain-of-Focus

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xtong-zhang/Chain-of-Focus)

xtong-zhang / Chain-of-Focus

☆70

Alternatives and similar repositories for Chain-of-Focus

Users that are interested in Chain-of-Focus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MM-FIRE / FIRE
View on GitHub
☆13Nov 5, 2024Updated last year
mat-agent / MAT-Agent
View on GitHub
MAT: Multi-modal Agent Tuning 🔥 ICLR 2025 (Spotlight)
☆96Dec 18, 2025Updated 6 months ago
zhaochen0110 / Awesome_Think_With_Images
View on GitHub
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…
☆1,487Mar 9, 2026Updated 4 months ago
MMKE-Bench-ICLR / MMKE-Bench
View on GitHub
【ICLR 2025 🔥】MMKE-Bench, a challenging benchmark for evaluating diverse semantic editing in real-world scenarios.
☆23Apr 19, 2025Updated last year
Awenbocc / GEMeX-Project
View on GitHub
Official code of paper "GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis" [ICCV 2025]
☆48Jun 29, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
TIGER-AI-Lab / Pixel-Reasoner
View on GitHub
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
☆299Jun 4, 2026Updated last month
xuyang-liu16 / GlobalCom2
View on GitHub
[AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
☆42Jan 27, 2026Updated 5 months ago
Computer-use-agents / MacOS-Agent
View on GitHub
A powerful automation agent for macOS that enables natural language control of various system applications and services. This agent allow…
☆60Jun 5, 2025Updated last year
om-ai-lab / ZoomEye
View on GitHub
[EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
☆91Nov 20, 2025Updated 7 months ago
Haochen-Wang409 / TreeVGR
View on GitHub
[ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
☆92Jan 26, 2026Updated 5 months ago
hlk-1135 / RadGraph
View on GitHub
RadGraph: Extracting Clinical Entities and Relations from Radiology Reports
☆14Nov 22, 2022Updated 3 years ago
SZUHvern / MaCo
View on GitHub
The official implementation of "Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Ma…
☆12Sep 13, 2024Updated last year
guanjinquan / CXRTrek
View on GitHub
Interpreting Chest X-rays Like a Radiologist: A Benchmark with Clinical Reasoning, release the dataset and the model weight
☆13May 26, 2025Updated last year
mbzuai-oryx / MIRA
View on GitHub
[ACM MM 2025 🔥🔥 ] MIRA: A first-of-its-kind medical RAG framework that fuses image features and retrieved knowledge with dynamic contex…
☆23Aug 28, 2025Updated 10 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
PKU-ICST-MIPL / FineR1_ICLR2026
View on GitHub
☆66Apr 4, 2026Updated 3 months ago
keke-nice / MedTVT-R1
View on GitHub
CVPR2026
☆34Sep 18, 2025Updated 9 months ago
xinyan-cxy / MINT-CoT
View on GitHub
[NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
☆106Sep 19, 2025Updated 9 months ago
JerrryNie / ConceptCLIP
View on GitHub
☆26Jun 11, 2026Updated last month
Tang-xiaoxiao / 3D-RAD
View on GitHub
[ 🎯 NeurIPS 2025 ] 3D-RAD 🩻: A Comprehensive 3D Radiology Med-VQA Dataset with Multi-Temporal Analysis and Diverse Diagnostic Tasks
☆32Jun 22, 2026Updated 2 weeks ago
mbzuai-oryx / Agent-X
View on GitHub
ICLR 2026: Agent-X Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
☆43Apr 28, 2026Updated 2 months ago
TongUI-agent / TongUI-agent
View on GitHub
[AAAI 2026]Release of code, datasets and model for our work TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for General…
☆114Dec 1, 2025Updated 7 months ago
BigTaige / MP-GUI
View on GitHub
CVPR25
☆28Jul 2, 2025Updated last year
rui-qian / UGround
View on GitHub
Rui Qian, Xin Yin, Chuanhang Deng, et al.: UGround: Towards Unified Visual Grounding with Unrolled Transformers (ICML 2026)
☆27Jun 18, 2026Updated 3 weeks ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
LinjieMu / MMXU
View on GitHub
☆25Nov 27, 2025Updated 7 months ago
AV-Reasoner / AV-Reasoner
View on GitHub
☆19Jul 22, 2025Updated 11 months ago
SimengSun / ChapterBreak
View on GitHub
☆12Jun 5, 2024Updated 2 years ago
MShahabSepehri / MediConfusion
View on GitHub
The dataset and evaluation code for MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical found…
☆25Feb 19, 2026Updated 4 months ago
YaooXu / Anki-for-Diego
View on GitHub
基于开源软件anki的二次开发，简化了部分操作，“傻瓜式”英语学习软件
☆15Dec 8, 2022Updated 3 years ago
zifuwan / ONLY
View on GitHub
[ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models
☆51Jul 7, 2025Updated last year
memory-eqa / MemoryEQA
View on GitHub
MemoryEQA
☆27May 4, 2026Updated 2 months ago
EvolvingLMMs-Lab / OpenMMReasoner
View on GitHub
[CVPR 2026] OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
☆164Mar 30, 2026Updated 3 months ago
Ceaglex / LoVA
View on GitHub
The code and weight for LoVA. LoVA is a novel model for Long-form Video-to-Audio generation. Based on the Diffusion Transformer (DiT) arc…
☆16Feb 27, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
Mini-o3 / Mini-o3
View on GitHub
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
☆421Jan 29, 2026Updated 5 months ago
audio-captioning / caption-evaluation-tools
View on GitHub
Tools for the evaluation of audio captioning.
☆19May 23, 2020Updated 6 years ago
Visual-Agent / DeepEyes
View on GitHub
☆1,240Nov 20, 2025Updated 7 months ago
function2-llx / MMMM
View on GitHub
[NAACL 2025] VividMed: Vision Language Model with Versatile Visual Grounding for Medicine
☆31Mar 10, 2025Updated last year
hshjerry / VideoEspresso
View on GitHub
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆140Jul 28, 2025Updated 11 months ago
Schuture / DeepTumorVQA
View on GitHub
DeepTumorVQA benchmark for VLMs and Agents (10k testing samples)
☆40May 19, 2026Updated last month
EmmaSRH / ARVFM
View on GitHub
Awesome autoregressive vision foundation models
☆26Dec 24, 2024Updated last year