AI4LIFE-GROUP / med-safety-benchLinks
MedSafetyBench: Evaluating and Improving the Medical Safety of LLMs [NeurIPS 2024]
☆30Updated last month
Alternatives and similar repositories for med-safety-bench
Users that are interested in med-safety-bench are comparing it to the libraries listed below
Sorting:
- ☆30Updated 7 months ago
- (ICML 2023) Discover and Cure: Concept-aware Mitigation of Spurious Correlation☆41Updated last year
- ☆26Updated last year
- EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images, NeurIPS 2023 D&B☆85Updated last year
- Code for Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities (NeurIPS'24)☆29Updated 8 months ago
- Official Code Repository for the paper "Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-intensive Tasks…☆40Updated 9 months ago
- [NeurIPS 2024 Datasets and Benchmark Track Oral] MedCalc-Bench: Evaluating Large Language Models for Medical Calculations☆70Updated last month
- [EMNLP2024] Benchmark for "Large Language Models Are Poor Clinical Decision-Makers: A Comprehensive Benchmark"☆26Updated 9 months ago
- ☆32Updated last year
- Repo for the pape Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions☆40Updated last month
- ☆25Updated 2 months ago
- source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"☆53Updated 4 months ago
- Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records☆23Updated last year
- A new collection of medical VQA dataset based on MIMIC-CXR. Part of the work 'EHRXQA: A Multi-Modal Question Answering Dataset for Electr…☆88Updated 11 months ago
- code repo for ICLR 2024 paper "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs"☆130Updated last year
- Representation Surgery for Multi-Task Model Merging. ICML, 2024.☆46Updated 10 months ago
- [ICML 2025] MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding☆101Updated last month
- KAIST medical VL research group☆19Updated 8 months ago
- m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models☆41Updated 4 months ago
- MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning https://arxiv.org/abs/2503.07459☆54Updated 3 weeks ago
- ☆60Updated 5 months ago
- code for EMNLP 2024 paper: How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for M…☆13Updated 9 months ago
- [ICML 2023] Official repository of paper: Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repe…☆25Updated 3 weeks ago
- Implementation of Concept-level Debugging of Part-Prototype Networks☆12Updated 2 years ago
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆59Updated 10 months ago
- A Paper collection for LLM based Patient Simulators☆52Updated 2 months ago
- Code for "A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models"☆16Updated last month
- [ NeurIPS 2023 ] Official Codebase for "Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback"☆19Updated last year
- 🤫 Code and benchmark for our ICLR 2024 spotlight paper: "Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Con…☆44Updated last year
- Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)☆26Updated last year