stanfordmlgroup/MedAgentBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/stanfordmlgroup/MedAgentBench)

stanfordmlgroup / MedAgentBench

MedAgentBench: A Realistic Virtual EHR Environment to Benchmark Medical LLM Agents

☆308

Alternatives and similar repositories for MedAgentBench

Users that are interested in MedAgentBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SamuelSchmidgall / AgentClinic
View on GitHub
Agent benchmark for medical diagnosis
☆339Dec 31, 2024Updated last year
MAXNORM8650 / MedAgentSim
View on GitHub
MedAgentSim: Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions, MICCAI 2025 (oral and early accepted)
☆175Apr 7, 2026Updated 3 months ago
gersteinlab / MedicalAgentsBench
View on GitHub
[Patterns] MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
☆82Mar 10, 2026Updated 4 months ago
univanxx / 3mdbench
View on GitHub
3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark
☆24Sep 23, 2025Updated 10 months ago
yhzhu99 / MedAgentBoard
View on GitHub
[NeurIPS 2025] MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks
☆59Mar 13, 2026Updated 4 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
glee4810 / FHIR-AgentBench
View on GitHub
Code and Data for FHIR-AgentBench
☆26Dec 15, 2025Updated 7 months ago
jinlab-imvr / MedAgent-Pro
View on GitHub
[2026 ICLR] The official code for MedAgent_Pro
☆182May 12, 2026Updated 2 months ago
wshi83 / EhrAgent
View on GitHub
[EMNLP'24] EHRAgent: Code Empowers Large Language Models for Complex Tabular Reasoning on Electronic Health Records
☆137Dec 26, 2024Updated last year
yczhou001 / MAM
View on GitHub
MAM: ModularMulti-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration
☆53Apr 3, 2026Updated 3 months ago
HealthRex / PhysicianBench
View on GitHub
The benchmark tasks and evaluation harness for "PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments".
☆48Jul 10, 2026Updated 2 weeks ago
PKU-AICare / ConfAgents
View on GitHub
ConfAgents: A Conformal-Guided Multi-Agent Framework for Cost-Efficient Medical Diagnosis
☆15Jul 22, 2026Updated last week
nec-research / meddxagent
View on GitHub
MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential Diagnosis
☆22Jun 13, 2025Updated last year
yhzhu99 / HealthFlow
View on GitHub
HealthFlow: Automating electronic health record analysis via a strategically self-evolving multi-agent framework
☆48May 18, 2026Updated 2 months ago
mitmedialab / MDAgents
View on GitHub
Official implementation for NeurIPS'24 paper: MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making
☆288Nov 10, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
Yuejingkun / MedSG-Bench
View on GitHub
[NeurIPS 2025 DB Spotlight] MedSG-Bench: A Benchmark for Medical Image Sequences Grounding
☆18Oct 6, 2025Updated 9 months ago
BlueZeros / AgentEHR
View on GitHub
Agentic System, Tool Use, Electronic Health Record, Large Language Models, Clinical Nature Language Processing
☆24Apr 13, 2026Updated 3 months ago
mims-harvard / TxAgent
View on GitHub
TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools
☆647Jul 30, 2025Updated 11 months ago
Jeanselme / LLM-For-Survival-Analysis
View on GitHub
☆13Mar 23, 2024Updated 2 years ago
ncbi-nlp / Clinical-Tool-Learning
View on GitHub
☆27Aug 10, 2025Updated 11 months ago
gersteinlab / MedAgents
View on GitHub
[ACL 2024 Findings] MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning https://arxiv.org/abs/2311.10537
☆363May 27, 2024Updated 2 years ago
ARISENetwork / mast
View on GitHub
Medical AI Superintelligence Test
☆20Jul 15, 2026Updated 2 weeks ago
ljwztc / MedChain
View on GitHub
The repository for "MedChain: Bridging the Gap Between LLM Agents and Real-World Clinical Decision Making"
☆55Apr 8, 2026Updated 3 months ago
AgenticHealthAI / Awesome-AI-Agents-for-Healthcare
View on GitHub
Latest Advances on Agentic AI & AI Agents for Healthcare
☆1,187Updated this week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
MAGIC-AI4Med / M3Builder
View on GitHub
The official codes for "M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging"
☆45Jul 28, 2025Updated last year
Wangyixinxin / MMedAgent
View on GitHub
Learning to Use Medical Tools with Multi-modal Agent
☆267Mar 18, 2026Updated 4 months ago
MAGIC-AI4Med / DiagGym
View on GitHub
A virtual clinical environment for self‑evolving LLM diagnostic agents.
☆108Feb 12, 2026Updated 5 months ago
chenxz1111 / RareBench
View on GitHub
[KDD2024 ADS Track] RareBench: Can LLMs Serve as Rare Diseases Specialists?
☆39Nov 28, 2025Updated 8 months ago
jinlab-imvr / 3DMedAgent
View on GitHub
[2026 ICML] 3DMedAgent: Unified Perception-to-Understanding for 3D Medical Analysis
☆26May 25, 2026Updated 2 months ago
AQ-MedAI / LiveClin
View on GitHub
LiveClin is a live benchmark designed for the faithful replication of clinical practice
☆16Feb 27, 2026Updated 5 months ago
mims-harvard / ClinVec
View on GitHub
ClinVec: Unified Embeddings of Clinical Codes Enable Knowledge-Grounded AI in Medicine
☆94Jun 25, 2026Updated last month
mims-harvard / MedLog
View on GitHub
A protocol for event-level logging of clinical AI.
☆26Jun 11, 2026Updated last month
DATEXIS / AMEGA-benchmark
View on GitHub
AMEGA-LLM: Autonomous Medical Evaluation for Guideline Adherence of Large Language Models
☆31Jun 10, 2026Updated last month
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
MAGIC-AI4Med / MedRBench
View on GitHub
[Nature Communications] The official code for "Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases".
☆70Nov 7, 2025Updated 8 months ago
Medlinker-MG / CSEDB
View on GitHub
CSEDB - Clinical Safety-Effectiveness Dual-Track Benchmark
☆20Aug 13, 2025Updated 11 months ago
StanfordMIMI / clin-summ
View on GitHub
Clinical text summarization by adapting large language models
☆161Jul 31, 2024Updated last year
mhxu1998 / FlexCare
View on GitHub
KDD 2024 | FlexCare: Leveraging Cross-Task Synergy for Flexible Multimodal Healthcare Prediction
☆18Sep 4, 2024Updated last year
ncbi-nlp / TrialGPT
View on GitHub
Code and data for TrialGPT.
☆165Jan 24, 2025Updated last year
gzxiong / MedRAG
View on GitHub
Code for the MedRAG toolkit
☆580May 8, 2025Updated last year
Dyke-F / LLM_RAG_Agent
View on GitHub
☆75Jul 18, 2024Updated 2 years ago