felipemaiapolo / promptevalLinks
Efficient multi-prompt evaluation of LLMs
☆21Updated 7 months ago
Alternatives and similar repositories for prompteval
Users that are interested in prompteval are comparing it to the libraries listed below
Sorting:
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆107Updated last year
- Code for Language-Interfaced FineTuning for Non-Language Machine Learning Tasks.☆129Updated 8 months ago
- ☆28Updated 4 months ago
- ☆39Updated 2 years ago
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆40Updated 4 months ago
- Official implementation of the ACL 2024: Scientific Inspiration Machines Optimized for Novelty☆82Updated last year
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆107Updated 9 months ago
- Interpretating the latent space representations of attention head outputs for LLMs☆34Updated 11 months ago
- Discovering Data-driven Hypotheses in the Wild☆102Updated last month
- Evaluating LLMs with fewer examples☆160Updated last year
- ☆29Updated last year
- ☆50Updated 4 months ago
- A mechanistic approach for understanding and detecting factual errors of large language models.☆46Updated last year
- ☆69Updated 11 months ago
- ☆91Updated 5 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆206Updated last month
- Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"☆18Updated last month
- Extending Conformal Prediction to LLMs☆67Updated last year
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆47Updated 6 months ago
- Codebase accompanying the Summary of a Haystack paper.☆79Updated 10 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆90Updated 7 months ago
- [NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.☆151Updated last year
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆86Updated last year
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 5 months ago
- Data and code for the Corr2Cause paper (ICLR 2024)☆108Updated last year
- Finding semantically meaningful and accurate prompts.☆47Updated last year
- Code accompanying "How I learned to start worrying about prompt formatting".☆106Updated last month
- ☆41Updated last year
- OLAPH: Improving Factuality in Biomedical Long-form Question Answering☆39Updated 10 months ago
- Official implementation of "BERTs are Generative In-Context Learners"☆31Updated 4 months ago