dmis-lab / OLAPH
OLAPH: Improving Factuality in Biomedical Long-form Question Answering
☆38Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for OLAPH
- ☆82Updated 3 months ago
- [EMNLP'24] EHRAgent: Code Empowers Large Language Models for Complex Tabular Reasoning on Electronic Health Records☆65Updated last month
- Official repository of the MIRAGE benchmark☆96Updated 2 weeks ago
- Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"☆109Updated 3 months ago
- MedAlign is a clinician-generated dataset for instruction following with electronic medical records.☆89Updated last year
- ☆36Updated last month
- For Med-Gemini, we relabeled the MedQA benchmark; this repo includes the annotations and analysis code.☆35Updated 5 months ago
- Code for MedCPT, a model for zero-shot biomedical information retrieval.☆140Updated 7 months ago
- Official implementation of the ACL 2024: Scientific Inspiration Machines Optimized for Novelty☆68Updated 7 months ago
- Biomedical Question Answering Datasets.☆79Updated last year
- Clinical NLP Shared Task @ NAACL'24☆27Updated last month
- Self-verification for LLMs.☆62Updated last year
- Benchmarking the medical calculation capabilities of large language models.☆25Updated this week
- ISMB'24 "Self-BioRAG: Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models"☆41Updated 7 months ago
- ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆20Updated last week
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆96Updated last month
- Code for the MedRAG toolkit☆196Updated last month
- ☆216Updated 5 months ago
- [NeurIPS'22] EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records☆74Updated 4 months ago
- Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Ref…☆23Updated last month
- BioDEX: Large-Scale Biomedical Adverse Drug Event Extraction for Real-World Pharmacovigilance.☆46Updated 9 months ago
- ☆64Updated last month
- A comprehensive repository of reasoning tasks for Medical LLMs (and beyond)☆96Updated 2 months ago
- Codebase accompanying the Summary of a Haystack paper.☆72Updated 2 months ago
- ☆49Updated last year
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆78Updated 8 months ago
- The first dense retrieval model that can be prompted like an LM☆63Updated 2 months ago
- Code accompanying "How I learned to start worrying about prompt formatting".☆95Updated last month
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; arXiv preprint arXiv:2403.…☆37Updated 4 months ago
- [NeurIPS 2024 D&B Track, Spotlight] UltraMedical: Building Specialized Generalists in Biomedicine☆60Updated last month