gersteinlab/medagents-benchmark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/gersteinlab/medagents-benchmark)

gersteinlab / medagents-benchmark

MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning

☆76

Alternatives and similar repositories for medagents-benchmark

Users that are interested in medagents-benchmark are comparing it to the libraries listed below

Sorting:

yhzhu99 / MedAgentBoard
View on GitHub
MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks
☆46Oct 5, 2025Updated 4 months ago
ritaranx / ClinGen
View on GitHub
[ACL 2024 Findings] This is the code for our paper "Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation wi…
☆41Jun 23, 2024Updated last year
UARK-AICV / FG-CXR
View on GitHub
The repository of the ACCV 2024 paper "FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Ge…
☆11Jul 28, 2025Updated 7 months ago
multimodallearning / DG-TTA
View on GitHub
DG-TTA
☆14Apr 3, 2025Updated 11 months ago
DDVD233 / QoQ_Med
View on GitHub
☆43Jul 31, 2025Updated 7 months ago
wshi83 / MedAgentGym
View on GitHub
[ICLR'26] MedAgentGYM: Training LLM Agents for Code-Based Medical Reasoning at Scale
☆81Feb 2, 2026Updated last month
mazurowski-lab / single-image-test-time-adaptation
View on GitHub
Single-image test-time domain adaptation for segmentation models.
☆26Apr 16, 2024Updated last year
paulhager / MIMIC-Clinical-Decision-Making-Framework
View on GitHub
Code repository for the framework to engage in clinical decision making task using the MIMIC-CDM dataset.
☆49Feb 7, 2025Updated last year
baeseongsu / Clinical-LLM-FineTuning-HandsOn
View on GitHub
Hands-on repository for fine-tuning Large Language Models (LLMs) in the clinical domain with tutorials
☆13Jan 9, 2026Updated last month
barthelemymp / TULIP-TCR
View on GitHub
☆14May 15, 2024Updated last year
Zheng321 / Deep-Reinforcement-Learning-for-Cost-Effective-Medical-Diagnosis
View on GitHub
This repo contains the core codes for the paper "Deep Reinforcement Learning for Cost-Effective Medical Diagnosis".
☆13Apr 7, 2023Updated 2 years ago
zenghy96 / Reliable-Source-Approximation
View on GitHub
Reliable Source Approximation: Source-Free Domain Adaptation for Vestibular Schwannoma MRI Segmentation
☆11Dec 28, 2024Updated last year
mitmedialab / MDAgents
View on GitHub
Official implementation for NeurIPS'24 paper: MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making
☆238Nov 10, 2024Updated last year
RyanWangZf / PromptEHR
View on GitHub
EMNLP'22 | PromptEHR: Conditional Electronic Healthcare Records Generation with Prompt Learning
☆32Jun 8, 2023Updated 2 years ago
SamuelSchmidgall / AgentClinic
View on GitHub
Agent benchmark for medical diagnosis
☆278Dec 31, 2024Updated last year
baeseongsu / awesome-machine-learning-for-healthcare
View on GitHub
A curated collection of cutting-edge research at the intersection of machine learning and healthcare. This repository will be actively ma…
☆34Apr 12, 2025Updated 10 months ago
Qsingle / open-medical-r1
View on GitHub
This repository is aim to reproduce the R1-Zero on medical domain.
☆32Jun 11, 2025Updated 8 months ago
gersteinlab / MedAgents
View on GitHub
[ACL 2024 Findings] MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning https://arxiv.org/abs/2311.10537
☆323May 27, 2024Updated last year
Tang-xiaoxiao / 3D-RAD
View on GitHub
[ 🎯 NeurIPS 2025 ] 3D-RAD 🩻: A Comprehensive 3D Radiology Med-VQA Dataset with Multi-Temporal Analysis and Diverse Diagnostic Tasks
☆27Oct 28, 2025Updated 4 months ago
ritaranx / AceSearcher
View on GitHub
This is the code repo for the paper AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play (NeurIPS 2025 Spotl…
☆25Sep 29, 2025Updated 5 months ago
gersteinlab / BC-Design
View on GitHub
BC-Design: A Biochemistry-Aware Framework for High-Precision Inverse Protein Folding https://www.biorxiv.org/content/10.1101/2024.10.28.6…
☆20Nov 24, 2025Updated 3 months ago
EndoluminalSurgicalVision-IMR / PASS
View on GitHub
[IEEE TMI 2024] PASS: Prompt tuning for both styles and semantic shapes
☆21Feb 12, 2025Updated last year
bio-mlhui / MedGround-R1
View on GitHub
Offical Code of MICCAI'25 Best-Paper-Shortlist paper "MedGround-R1: Advancing Medical Image Grounding via Spatial-Semantic Rewarded Group…
☆36Sep 28, 2025Updated 5 months ago
SUSTechBruce / Med-UniC
View on GitHub
official implementation of "Med-Unic: unifying cross-lingual medical vision-language pre-training by diminishing bias"
☆17Sep 22, 2023Updated 2 years ago
mhxu1998 / FlexCare
View on GitHub
KDD 2024 | FlexCare: Leveraging Cross-Task Synergy for Flexible Multimodal Healthcare Prediction
☆17Sep 4, 2024Updated last year
ritaranx / RAM-EHR
View on GitHub
[ACL 2024] This is the code for our paper ”RAM-EHR: Retrieval Augmentation Meets Clinical Predictions on Electronic Health Records“.
☆41Sep 19, 2024Updated last year
siyi-wind / MDViT
View on GitHub
[MICCAI 2023] MDViT: Multi-domain Vision Transformer for Small Medical Image Segmentation Datasets (an official implementation)
☆42Jan 12, 2024Updated 2 years ago
WeixiangYAN / ClinicalLab
View on GitHub
[NeurIPS 2025] ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World
☆126Aug 18, 2024Updated last year
ljy19970415 / AutoRG-Brain
View on GitHub
The official codes for "AutoRG-Brain: Grounded Report Generation for Brain MRI".
☆49Jan 6, 2026Updated last month
YuYang0901 / CLIP-spurious-finetune
View on GitHub
Mitigating Spurious Correlations in Multi-modal Models during Fine-tuning (ICML 2023)
☆19Dec 15, 2023Updated 2 years ago
CLU-UML / MedDec
View on GitHub
☆17Jan 25, 2026Updated last month
LinjieMu / MMXU
View on GitHub
☆21Nov 27, 2025Updated 3 months ago
ncbi-nlp / MedCalc-Bench
View on GitHub
[NeurIPS 2024 Datasets and Benchmark Track Oral] MedCalc-Bench: Evaluating Large Language Models for Medical Calculations
☆81Dec 18, 2025Updated 2 months ago
MrGiovanni / SMILE
View on GitHub
☆37Jan 26, 2026Updated last month
yczhou001 / MAM
View on GitHub
MAM: ModularMulti-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration
☆37Jun 25, 2025Updated 8 months ago
UCSC-VLAA / m1
View on GitHub
[ML4H'25] m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models
☆48Dec 21, 2025Updated 2 months ago
MSIIP / Uni-Med
View on GitHub
☆46Nov 12, 2025Updated 3 months ago
zhi-xuan-chen / Reg2RG
View on GitHub
This is the official repository for the IEEE TMI paper titled "Large Language Model with Region-Guided Referring and Grounding for CT Rep…
☆67Jun 28, 2025Updated 8 months ago
nhussein / promptsmooth
View on GitHub
Official implementation of the paper "PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning"
☆23Apr 17, 2025Updated 10 months ago