felipemaiapolo / promptevalLinks
Efficient multi-prompt evaluation of LLMs
☆19Updated 6 months ago
Alternatives and similar repositories for prompteval
Users that are interested in prompteval are comparing it to the libraries listed below
Sorting:
- Code for Language-Interfaced FineTuning for Non-Language Machine Learning Tasks.☆126Updated 6 months ago
- ☆28Updated 3 months ago
- This repository contains data, code and models for contextual noncompliance.☆22Updated 10 months ago
- Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"☆18Updated 11 months ago
- ☆39Updated 2 years ago
- In-context Example Selection with Influences☆15Updated 2 years ago
- This is the repo for constructing a comprehensive and rigorous evaluation framework for LLM calibration.☆13Updated last year
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆106Updated 8 months ago
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆108Updated last year
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- Data and code for the preprint "In-Context Learning with Long-Context Models: An In-Depth Exploration"☆35Updated 9 months ago
- ☆50Updated last year
- NLPBench: Evaluating NLP-Related Problem-solving Ability in Large Language Models☆10Updated last year
- ☆20Updated last month
- Code for "Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model", EMNLP Findings 20…☆28Updated last year
- A Survey of Hallucination in Large Foundation Models☆54Updated last year
- [NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…☆33Updated last year
- Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]☆30Updated 4 months ago
- Skill-It! A Data-Driven Skills Framework for Understanding and Training Language Models☆46Updated last year
- ☆29Updated 10 months ago
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]☆70Updated 6 months ago
- Codebase for reproducing the experiments of the semantic uncertainty paper (paragraph-length experiments).☆60Updated last year
- ☆17Updated 2 months ago
- ☆32Updated 10 months ago
- source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"☆45Updated last month
- ☆44Updated 9 months ago
- ☆22Updated 5 months ago
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant se…☆60Updated 2 years ago
- The TABLET benchmark for evaluating instruction learning with LLMs for tabular prediction.☆21Updated 2 years ago
- ☆28Updated 7 months ago