gersteinlab/MedAgentsBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/gersteinlab/MedAgentsBench)

gersteinlab / MedAgentsBench

[Patterns] MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning

☆80

Alternatives and similar repositories for MedAgentsBench

Users that are interested in MedAgentsBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yhzhu99 / MedAgentBoard
View on GitHub
[NeurIPS 2025] MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks
☆54Mar 13, 2026Updated 2 months ago
ritaranx / ClinGen
View on GitHub
[ACL 2024 Findings] This is the code for our paper "Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation wi…
☆42Jun 23, 2024Updated last year
wshi83 / MedAgentGym
View on GitHub
[ICLR'26] MedAgentGYM: Training LLM Agents for Code-Based Medical Reasoning at Scale
☆111Apr 12, 2026Updated last month
UARK-AICV / FG-CXR
View on GitHub
The repository of the ACCV 2024 paper "FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Ge…
☆11Jul 28, 2025Updated 9 months ago
barthelemymp / TULIP-TCR
View on GitHub
☆14May 15, 2024Updated 2 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
Tang-xiaoxiao / 3D-RAD
View on GitHub
[ 🎯 NeurIPS 2025 ] 3D-RAD 🩻: A Comprehensive 3D Radiology Med-VQA Dataset with Multi-Temporal Analysis and Diverse Diagnostic Tasks
☆31Oct 28, 2025Updated 6 months ago
mazurowski-lab / single-image-test-time-adaptation
View on GitHub
Single-image test-time domain adaptation for segmentation models.
☆25Apr 16, 2024Updated 2 years ago
8023looker / Med-RR
View on GitHub
☆31Nov 27, 2025Updated 5 months ago
multimodallearning / DG-TTA
View on GitHub
DG-TTA
☆14Apr 3, 2025Updated last year
ritaranx / AceSearcher
View on GitHub
This is the code repo for the paper AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play (NeurIPS 2025 Spotl…
☆25Sep 29, 2025Updated 7 months ago
SamuelSchmidgall / AgentClinic
View on GitHub
Agent benchmark for medical diagnosis
☆305Dec 31, 2024Updated last year
mitmedialab / MDAgents
View on GitHub
Official implementation for NeurIPS'24 paper: MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making
☆262Nov 10, 2024Updated last year
RyanWangZf / PromptEHR
View on GitHub
EMNLP'22 | PromptEHR: Conditional Electronic Healthcare Records Generation with Prompt Learning
☆31Jun 8, 2023Updated 2 years ago
paulhager / MIMIC-Clinical-Decision-Making-Framework
View on GitHub
Code repository for the framework to engage in clinical decision making task using the MIMIC-CDM dataset.
☆49Feb 7, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
zenghy96 / Reliable-Source-Approximation
View on GitHub
Reliable Source Approximation: Source-Free Domain Adaptation for Vestibular Schwannoma MRI Segmentation
☆11Dec 28, 2024Updated last year
MatthewKKai / MaLP
View on GitHub
Implementation Code for "LLM-based Medical Assistant Personalization with Short- and Long-Term Memory Coordination"
☆14Updated this week
MrGiovanni / SMILE
View on GitHub
☆41Jan 26, 2026Updated 3 months ago
gersteinlab / MedAgents
View on GitHub
[ACL 2024 Findings] MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning https://arxiv.org/abs/2311.10537
☆345May 27, 2024Updated last year
ritaranx / BMRetriever
View on GitHub
[EMNLP 2024] This is the code for our paper "BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers".
☆26Sep 19, 2024Updated last year
IDEA-XL / PRESTO
View on GitHub
PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes [EMNLP 2024]
☆28Nov 18, 2024Updated last year
TsinghuaC3I / MedXpertQA
View on GitHub
[ICML 2025] MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
☆158Jul 17, 2025Updated 10 months ago
Qsingle / open-medical-r1
View on GitHub
This repository is aim to reproduce the R1-Zero on medical domain.
☆32Jun 11, 2025Updated 11 months ago
night-chen / DyGen
View on GitHub
[KDD'23] This is the code repo for our KDD'23 paper "DyGen: Learning from Noisy Labels via Dynamics-Enhanced Generative Modeling".
☆11Jun 14, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
UCSC-VLAA / m1
View on GitHub
[ML4H'25] m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models
☆48Dec 21, 2025Updated 5 months ago
baeseongsu / Clinical-LLM-FineTuning-HandsOn
View on GitHub
Hands-on repository for fine-tuning Large Language Models (LLMs) in the clinical domain with tutorials
☆16Jan 9, 2026Updated 4 months ago
MSIIP / Uni-Med
View on GitHub
☆47Nov 12, 2025Updated 6 months ago
Zheng321 / Deep-Reinforcement-Learning-for-Cost-Effective-Medical-Diagnosis
View on GitHub
This repo contains the core codes for the paper "Deep Reinforcement Learning for Cost-Effective Medical Diagnosis".
☆14Apr 7, 2023Updated 3 years ago
zhi-xuan-chen / Reg2RG
View on GitHub
This is the official repository for the IEEE TMI paper titled "Large Language Model with Region-Guided Referring and Grounding for CT Rep…
☆69Jun 28, 2025Updated 10 months ago
serenayj / DRKnows
View on GitHub
Diagnostic Reasoning Knowledge Graph for Large Language Model Diagnosis Prediction
☆38Jul 16, 2025Updated 10 months ago
Wangyixinxin / MMedAgent
View on GitHub
Learning to Use Medical Tools with Multi-modal Agent
☆255Mar 18, 2026Updated 2 months ago
AI-in-Health / ClinicBench
View on GitHub
[EMNLP2024] Benchmark for "Large Language Models Are Poor Clinical Decision-Makers: A Comprehensive Benchmark"
☆37May 2, 2026Updated 3 weeks ago
yhzhu99 / llm4healthcare
View on GitHub
Prompting Large Language Models for Zero-Shot Clinical Prediction with Structured Longitudinal Electronic Health Record Data
☆29Jan 17, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
MAXNORM8650 / MedAgentSim
View on GitHub
MedAgentSim: Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions, MICCAI 2025 (oral and early accepted)
☆163Apr 7, 2026Updated last month
bowang-lab / MedRAX
View on GitHub
MedRAX: Medical Reasoning Agent for Chest X-ray - ICML 2025
☆1,173Oct 31, 2025Updated 6 months ago
bio-mlhui / MedGround-R1
View on GitHub
Offical Code of MICCAI'25 Best-Paper-Shortlist paper "MedGround-R1: Advancing Medical Image Grounding via Spatial-Semantic Rewarded Group…
☆40Sep 28, 2025Updated 7 months ago
XZhang97666 / MultimodalMIMIC
View on GitHub
☆42May 22, 2023Updated 3 years ago
yiqingxyq / DocLens
View on GitHub
Code for "DocLens: Multi-aspect Fine-grained Evaluation for Medical Text Generation" (ACL 2024)
☆22May 18, 2024Updated 2 years ago
LinjieMu / MMXU
View on GitHub
☆24Nov 27, 2025Updated 5 months ago
ljy19970415 / AutoRG-Brain
View on GitHub
The official codes for "AutoRG-Brain: Grounded Report Generation for Brain MRI".
☆57Jan 6, 2026Updated 4 months ago