felipemaiapolo / prompteval
Efficient multi-prompt evaluation of LLMs
☆19Updated 4 months ago
Alternatives and similar repositories for prompteval:
Users that are interested in prompteval are comparing it to the libraries listed below
- AIR-Bench 2024 is a safety benchmark that aligns with emerging government regulations and company policies☆19Updated 8 months ago
- Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"☆17Updated 9 months ago
- ☆27Updated 9 months ago
- Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning☆46Updated last year
- Using Explanations as a Tool for Advanced LLMs☆60Updated 7 months ago
- Can Knowledge Editing Really Correct Hallucinations? (ICLR 2025)☆12Updated 2 months ago
- ☆28Updated last month
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆104Updated 6 months ago
- This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".☆28Updated 8 months ago
- ☆15Updated last month
- Code for our paper: "GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models"☆53Updated 2 years ago
- This is the repo for constructing a comprehensive and rigorous evaluation framework for LLM calibration.☆12Updated last year
- Code for Language-Interfaced FineTuning for Non-Language Machine Learning Tasks.☆125Updated 5 months ago
- Learning adapter weights from task descriptions☆16Updated last year
- This repository contains data, code and models for contextual noncompliance.☆21Updated 9 months ago
- NLPBench: Evaluating NLP-Related Problem-solving Ability in Large Language Models☆10Updated last year
- Adding new tasks to T0 without catastrophic forgetting☆33Updated 2 years ago
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆47Updated 3 months ago
- Uncertainty quantification for in-context learning of large language models☆16Updated last year
- Synthetic Data Generation for Evaluation☆12Updated 2 months ago
- Data and code for the preprint "In-Context Learning with Long-Context Models: An In-Depth Exploration"☆35Updated 8 months ago
- ☆31Updated last year
- ✨ Resolving Knowledge Conflicts in Large Language Models, COLM 2024☆16Updated 6 months ago
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆108Updated last year
- Tree prompting: easy-to-use scikit-learn interface for improved prompting.☆35Updated last year
- Interpretable and efficient predictors using pre-trained language models. Scikit-learn compatible.☆42Updated last month
- Evaluation of neuro-symbolic engines☆35Updated 8 months ago
- ☆39Updated 2 years ago
- Accompanying code for "Boosted Prompt Ensembles for Large Language Models"☆30Updated 2 years ago
- [ACL 2023]: Training Trajectories of Language Models Across Scales https://arxiv.org/pdf/2212.09803.pdf☆23Updated last year