felipemaiapolo / promptevalLinks
Efficient multi-prompt evaluation of LLMs
☆26Updated last year
Alternatives and similar repositories for prompteval
Users that are interested in prompteval are comparing it to the libraries listed below
Sorting:
- Codebase accompanying the Summary of a Haystack paper.☆80Updated last year
- Discovering Data-driven Hypotheses in the Wild☆124Updated 6 months ago
- Optimize Any User-defined Compound AI Systems☆65Updated 4 months ago
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆47Updated 9 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆222Updated 2 weeks ago
- Evaluating LLMs with fewer examples☆170Updated last year
- Aioli: A unified optimization framework for language model data mixing☆31Updated 11 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆102Updated last year
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆112Updated last year
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆153Updated last year
- OLAPH: Improving Factuality in Biomedical Long-form Question Answering☆37Updated last year
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 10 months ago
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆117Updated 4 months ago
- ☆43Updated last year
- Leveraging Base Language Models for Few-Shot Synthetic Data Generation☆40Updated 2 months ago
- Official implementation of the ACL 2024: Scientific Inspiration Machines Optimized for Novelty☆92Updated last year
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆42Updated last year
- Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning☆46Updated 2 years ago
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆48Updated 11 months ago
- ☆52Updated 9 months ago
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆55Updated last year
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆173Updated 3 weeks ago
- ☆129Updated last year
- Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"☆19Updated 6 months ago
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆107Updated 2 years ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆43Updated last year
- Official implementation of "BERTs are Generative In-Context Learners"☆32Updated 9 months ago
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆91Updated last year
- Improving Text Embedding of Language Models Using Contrastive Fine-tuning☆66Updated last year
- Source code for the collaborative reasoner research project at Meta FAIR.☆111Updated 8 months ago