microsoft / promptbench
A unified evaluation framework for large language models
☆2,465Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for promptbench
- This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai,…☆1,836Updated 5 months ago
- ☆1,830Updated 6 months ago
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆4,653Updated this week
- The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".☆1,439Updated 5 months ago
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆2,224Updated last week
- Tools for merging pretrained large language models.☆4,816Updated 2 weeks ago
- PyTorch native finetuning library☆4,336Updated this week
- Official Implementation of "Graph of Thoughts: Solving Elaborate Problems with Large Language Models"☆2,178Updated last month
- [COLM 2024] OpenAgents: An Open Platform for Language Agents in the Wild☆3,996Updated this week
- ☆2,595Updated last week
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆1,565Updated 3 months ago
- General technology for enabling AI capabilities w/ LLMs and MLLMs☆3,699Updated last month
- [ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling☆1,529Updated 4 months ago
- [ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.☆4,845Updated this week
- A curated list of Large Language Model (LLM) Interpretability resources.☆1,149Updated 3 months ago
- MTEB: Massive Text Embedding Benchmark☆1,954Updated this week
- DeepSeek LLM: Let there be answers☆1,451Updated 9 months ago
- Robust recipes to align language models with human and AI preferences☆4,680Updated last month
- [ICLR 2023] ReAct: Synergizing Reasoning and Acting in Language Models☆2,052Updated 9 months ago
- An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.☆1,529Updated last week
- Supercharge Your LLM Application Evaluations 🚀☆7,261Updated this week
- ⚡FlashRAG: A Python Toolkit for Efficient RAG Research☆1,335Updated this week
- A generalized information-seeking agent system with Large Language Models (LLMs).☆1,104Updated 5 months ago
- Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09…☆1,948Updated this week
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,057Updated 2 months ago
- A blazing fast inference solution for text embeddings models☆2,846Updated 2 weeks ago
- ☆4,035Updated 5 months ago
- Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-…☆1,256Updated 3 months ago
- prompt2model - Generate Deployable Models from Natural Language Instructions☆1,964Updated 6 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆1,634Updated this week