felipemaiapolo / promptevalLinks
Efficient multi-prompt evaluation of LLMs
☆27Updated last year
Alternatives and similar repositories for prompteval
Users that are interested in prompteval are comparing it to the libraries listed below
Sorting:
- Optimize Any User-defined Compound AI Systems☆66Updated 5 months ago
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆112Updated last year
- Evaluating LLMs with fewer examples☆169Updated last year
- Discovering Data-driven Hypotheses in the Wild☆127Updated 7 months ago
- Codebase accompanying the Summary of a Haystack paper.☆80Updated last year
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆102Updated last year
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆107Updated 2 years ago
- ☆52Updated 10 months ago
- ☆94Updated 11 months ago
- [EMNLP'24] EHRAgent: Code Empowers Large Language Models for Complex Tabular Reasoning on Electronic Health Records☆118Updated last year
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆48Updated last year
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆153Updated last year
- ScienceMeter: Tracking Scientific Knowledge Updates in Language Models☆17Updated 6 months ago
- Extending Conformal Prediction to LLMs☆69Updated last year
- PyTorch library for Active Fine-Tuning☆96Updated 3 months ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆71Updated last year
- This is the repo for constructing a comprehensive and rigorous evaluation framework for LLM calibration.☆13Updated last year
- ☆42Updated last year
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆47Updated 10 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆161Updated 6 months ago
- A toolkit to induce interpretable workflows from raw computer-use activities.☆35Updated 2 months ago
- Using Explanations as a Tool for Advanced LLMs☆68Updated last year
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆222Updated last month
- ☆58Updated 3 months ago
- Aioli: A unified optimization framework for language model data mixing☆32Updated last year
- LLM Attributor: Attribute LLM's Generated Text to Training Data☆70Updated 4 months ago
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆91Updated last year
- OLAPH: Improving Factuality in Biomedical Long-form Question Answering☆37Updated last year
- ☆28Updated 10 months ago
- ☆129Updated last year