microsoft / promptbench
A unified evaluation framework for large language models
☆2,569Updated last month
Alternatives and similar repositories for promptbench:
Users that are interested in promptbench are comparing it to the libraries listed below
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆1,767Updated 7 months ago
- This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai,…☆2,009Updated 10 months ago
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆2,453Updated last month
- An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.☆1,695Updated 2 months ago
- Official Implementation of "Graph of Thoughts: Solving Elaborate Problems with Large Language Models"☆2,324Updated 3 months ago
- A framework for prompt tuning using Intent-based Prompt Calibration☆2,426Updated 4 months ago
- The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".☆1,496Updated 9 months ago
- ☆1,982Updated 10 months ago
- [ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.☆4,942Updated 4 months ago
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆4,969Updated 2 weeks ago
- ☆2,459Updated this week
- [ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling☆1,640Updated 8 months ago
- AgentTuning: Enabling Generalized Agent Abilities for LLMs☆1,402Updated last year
- Tools for merging pretrained large language models.☆5,458Updated this week
- MTEB: Massive Text Embedding Benchmark☆2,338Updated this week
- A curated list of Large Language Model (LLM) Interpretability resources.☆1,269Updated 3 months ago
- prompt2model - Generate Deployable Models from Natural Language Instructions☆1,983Updated 2 months ago
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,328Updated last month
- Must-read Papers on LLM Agents.☆2,232Updated last month
- Toolkit for creating, sharing and using natural language prompts.☆2,803Updated last year
- The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.☆740Updated 10 months ago
- A comprehensive guide to building RAG-based LLM applications for production.☆1,781Updated 7 months ago
- Supercharge Your LLM Application Evaluations 🚀☆8,563Updated this week
- An Open-source Toolkit for LLM Development☆2,762Updated 2 months ago
- Benchmarking large language models' complex reasoning ability with chain-of-thought prompting☆2,697Updated 7 months ago
- Efficient Retrieval Augmentation and Generation Framework☆1,495Updated 2 months ago
- [ICML'24] Magicoder: Empowering Code Generation with OSS-Instruct☆2,001Updated 4 months ago
- PyTorch native post-training library☆5,014Updated this week
- ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting wit…☆1,047Updated last year
- Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads☆2,466Updated 9 months ago