MedAgentBench: A Realistic Virtual EHR Environment to Benchmark Medical LLM Agents
β229Nov 21, 2025Updated 3 months ago
Alternatives and similar repositories for MedAgentBench
Users that are interested in MedAgentBench are comparing it to the libraries listed below
Sorting:
- PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions (NeurIPS 2025 D&B track, Spotlight)β23Feb 11, 2026Updated 3 weeks ago
- π©» NV-Reason-CXR-3B is a specialized vision-language model designed for medical reasoning and interpretation of chest X-ray images.β43Updated this week
- [NeurIPS 2025] Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Maskingβ22Oct 22, 2025Updated 4 months ago
- KAIST medical VL research groupβ20Dec 20, 2024Updated last year
- [CVPR 2025] MicroVQA eval and π€RefineBot code for "MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research"β¦β32Nov 25, 2025Updated 3 months ago
- documentation used in my projectsβ16Feb 24, 2026Updated last week
- β21Nov 27, 2025Updated 3 months ago
- The official repository of paper named 'A Refer-and-Ground Multimodal Large Language Model for Biomedicine'β34Nov 5, 2024Updated last year
- Joint Embedding of Deep Visual and Semantic Features for Medical Image Report Generationβ18Nov 13, 2025Updated 3 months ago
- Agent benchmark for medical diagnosisβ280Dec 31, 2024Updated last year
- [IEEE TMI] This is the official repository for "UniChest: Conquer-and-Divide Pre-training for Multi-Source Chest X-Ray Classification"β19Aug 2, 2024Updated last year
- β32Oct 18, 2024Updated last year
- REMed: Retrieval-Enhanced Medical prediction modelβ23Jan 8, 2025Updated last year
- [ACL 2025 Findings] "Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"β25Feb 21, 2025Updated last year
- [ECCV'2024] HERGen: Elevating Radiology Report Generation with Longitudinal Dataβ28Jan 25, 2026Updated last month
- EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images (NeurIPS 2023 D&B)β91Feb 6, 2026Updated 3 weeks ago
- FEMR (Framework for Electronic Medical Records) provides tooling for large-scale, self-supervised learning using electronic health recordβ¦β162Feb 23, 2026Updated last week
- β86Jun 26, 2023Updated 2 years ago
- β25Updated this week
- [EMNLP 2024] This is the code for our paper "BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers".β23Sep 19, 2024Updated last year
- ClinVec: Unified Embeddings of Clinical Codes Enable Knowledge-Grounded AI in Medicineβ82Jan 22, 2026Updated last month
- β10Nov 7, 2022Updated 3 years ago
- β17Aug 5, 2025Updated 6 months ago
- Repo for our work "Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence"β19Jun 2, 2025Updated 9 months ago
- β15Mar 12, 2024Updated last year
- [ML4H'25] m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Modelsβ48Dec 21, 2025Updated 2 months ago
- Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Recordsβ26Aug 21, 2024Updated last year
- This is the official repository for the IEEE TMI paper titled "Large Language Model with Region-Guided Referring and Grounding for CT Repβ¦β67Jun 28, 2025Updated 8 months ago
- A wrapper around libssh2 for .NETβ29Jan 21, 2026Updated last month
- β12Oct 3, 2023Updated 2 years ago
- β11Jun 21, 2025Updated 8 months ago
- Code for the AACL 2022 Paper "This Patient Looks Like That Patient: Prototypical Networks for Interpretable Diagnosis Prediction from Cliβ¦β12Nov 18, 2022Updated 3 years ago
- SING: SDE Inference via Natural Gradientsβ36Dec 9, 2025Updated 2 months ago
- DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discoveryβ20Sep 24, 2025Updated 5 months ago
- β16Jul 1, 2025Updated 8 months ago
- π·οΈ n8n Community Node for Scrappey API β Automate web scraping and data extraction with advanced anti-bot blocking technology, seamlesslβ¦β16Feb 2, 2026Updated last month
- Tutorial for TikZβ11Apr 3, 2025Updated 11 months ago
- A simple, interactive web tool to compare pricing and performance metrics of various AI models.β16Feb 22, 2026Updated last week
- Code for paper "Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs"β12Jun 11, 2025Updated 8 months ago