microsoft / promptbench
A unified evaluation framework for large language models
☆2,505Updated 2 months ago
Alternatives and similar repositories for promptbench:
Users that are interested in promptbench are comparing it to the libraries listed below
- This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai,…☆1,930Updated 7 months ago
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆2,326Updated 2 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆1,654Updated 5 months ago
- Tools for merging pretrained large language models.☆5,113Updated last week
- Robust recipes to align language models with human and AI preferences☆4,896Updated last month
- An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.☆1,609Updated 3 weeks ago
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆4,812Updated last month
- The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".☆1,466Updated 7 months ago
- A framework for prompt tuning using Intent-based Prompt Calibration☆2,311Updated last month
- ☆1,928Updated 8 months ago
- ☆2,289Updated this week
- A curated list of Large Language Model (LLM) Interpretability resources.☆1,205Updated 3 weeks ago
- A framework for few-shot evaluation of language models.☆7,474Updated this week
- ☆2,180Updated last week
- Official Implementation of "Graph of Thoughts: Solving Elaborate Problems with Large Language Models"☆2,245Updated last month
- ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting wit…☆1,011Updated 10 months ago
- An Open-source Toolkit for LLM Development☆2,747Updated this week
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,172Updated 4 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆1,879Updated this week
- Must-read Papers on LLM Agents.☆2,038Updated 2 months ago
- Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09…☆2,015Updated this week
- OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, …☆4,496Updated last week
- [ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling☆1,580Updated 6 months ago
- Supercharge Your LLM Application Evaluations 🚀☆7,889Updated this week
- ☆2,689Updated last month
- AgentTuning: Enabling Generalized Agent Abilities for LLMs☆1,383Updated last year
- ☆2,523Updated 7 months ago
- [COLM 2024] OpenAgents: An Open Platform for Language Agents in the Wild☆4,087Updated last month
- Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...☆1,765Updated 2 weeks ago
- MTEB: Massive Text Embedding Benchmark☆2,086Updated this week