microsoftarchive/promptbench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/microsoftarchive/promptbench)

microsoftarchive / promptbench

A unified evaluation framework for large language models

☆2,815

Alternatives and similar repositories for promptbench

Users that are interested in promptbench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MLGroupJLU / LLM-eval-survey
View on GitHub
The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".
☆1,608Apr 17, 2026Updated 3 months ago
microsoft / LLMLingua
View on GitHub
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…
☆6,454Apr 8, 2026Updated 3 months ago
microsoft / promptbase
View on GitHub
All things prompt engineering
☆5,758Jun 4, 2024Updated 2 years ago
stanfordnlp / dspy
View on GitHub
DSPy: The framework for programming—not prompting—language models
☆36,252Updated this week
EleutherAI / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of language models.
☆13,340Jul 13, 2026Updated last week
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
microsoft / promptflow
View on GitHub
Build high-quality LLM apps - from prototyping, testing to production deployment and monitoring.
☆11,185Jul 9, 2026Updated last week
guidance-ai / guidance
View on GitHub
A guidance language for controlling large language models.
☆21,685May 21, 2026Updated 2 months ago
open-compass / opencompass
View on GitHub
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, …
☆7,218Updated this week
ShishirPatil / gorilla
View on GitHub
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
☆12,953Apr 13, 2026Updated 3 months ago
OpenBMB / ToolBench
View on GitHub
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
☆5,701May 21, 2025Updated last year
lm-sys / FastChat
View on GitHub
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
☆39,494May 1, 2026Updated 2 months ago
Lightning-AI / litgpt
View on GitHub
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
☆13,491Updated this week
xlang-ai / OpenAgents
View on GitHub
[COLM 2024] OpenAgents: An Open Platform for Language Agents in the Wild
☆4,848Nov 18, 2024Updated last year
aiwaves-cn / agents
View on GitHub
An Open-source Framework for Data-centric, Self-evolving Autonomous Language Agents
☆5,947Sep 26, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
huggingface / alignment-handbook
View on GitHub
Robust recipes to align language models with human and AI preferences
☆5,639May 26, 2026Updated last month
microsoft / autogen
View on GitHub
A programming framework for agentic AI
☆59,846Apr 15, 2026Updated 3 months ago
arcee-ai / mergekit
View on GitHub
Tools for merging pretrained large language models.
☆7,246Jun 17, 2026Updated last month
run-llama / llama_index
View on GitHub
LlamaIndex is the leading document agent and OCR platform
☆50,962Updated this week
JShollaj / awesome-llm-interpretability
View on GitHub
A curated list of Large Language Model (LLM) Interpretability resources.
☆1,629Feb 24, 2026Updated 4 months ago
microsoft / TaskWeaver
View on GitHub
The first "code-first" agent framework for seamlessly planning and executing data analytics tasks.
☆6,174Mar 23, 2026Updated 3 months ago
neulab / prompt2model
View on GitHub
prompt2model - Generate Deployable Models from Natural Language Instructions
☆2,016Dec 29, 2024Updated last year
vibrantlabsai / ragas
View on GitHub
Supercharge Your LLM Application Evaluations 🚀
☆14,918Feb 24, 2026Updated 4 months ago
nlpxucan / WizardLM
View on GitHub
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
☆9,480Jun 7, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
meta-llama / llama-cookbook
View on GitHub
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We als…
☆18,417May 19, 2026Updated 2 months ago
mit-han-lab / streaming-llm
View on GitHub
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
☆7,248Jul 11, 2024Updated 2 years ago
apple / ml-ferret
View on GitHub
☆8,675Oct 9, 2024Updated last year
openai / evals
View on GitHub
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
☆18,954Apr 14, 2026Updated 3 months ago
letta-ai / letta
View on GitHub
Platform for stateful agents: AI with advanced memory that can learn and self-improve over time.
☆23,880Jul 3, 2026Updated 2 weeks ago
microsoft / LMOps
View on GitHub
General technology for enabling AI capabilities w/ LLMs and MLLMs
☆4,438Jun 17, 2026Updated last month
dottxt-ai / outlines
View on GitHub
Structured Outputs
☆14,573Updated this week
hegelai / prompttools
View on GitHub
Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chro…
☆3,042Feb 11, 2026Updated 5 months ago
hiyouga / LlamaFactory
View on GitHub
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
☆73,397Updated this week
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
THUDM / AgentBench
View on GitHub
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
☆3,586Feb 8, 2026Updated 5 months ago
TencentQQGYLab / AppAgent
View on GitHub
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
☆6,814Mar 19, 2025Updated last year
Codium-ai / AlphaCodium
View on GitHub
Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""
☆3,949Nov 25, 2024Updated last year
huggingface / text-generation-inference
View on GitHub
Large Language Model Text Generation Inference
☆10,876Mar 21, 2026Updated 4 months ago
SqueezeAILab / LLMCompiler
View on GitHub
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
☆1,861Jul 10, 2024Updated 2 years ago
microsoft / unilm
View on GitHub
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
☆22,167Jan 23, 2026Updated 5 months ago
huggingface / trl
View on GitHub
Train transformer language models with reinforcement learning.
☆18,892Updated this week