openai/evals

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/openai/evals)

openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

☆18,014

Alternatives and similar repositories for evals

Users that are interested in evals are comparing it to the libraries listed below

Sorting:

openai / openai-cookbook
View on GitHub
Examples and guides for using the OpenAI API
☆72,193Updated this week
openai / chatgpt-retrieval-plugin
View on GitHub
The ChatGPT Retrieval Plugin lets you easily find personal or work documents by asking questions in natural language.
☆21,226Jul 4, 2024Updated last year
run-llama / llama_index
View on GitHub
LlamaIndex is the leading document agent and OCR platform
☆47,753Updated this week
langchain-ai / langchain
View on GitHub
The agent engineering platform
☆129,503Updated this week
tatsu-lab / stanford_alpaca
View on GitHub
Code and documentation to train Stanford's Alpaca models, and generate the data.
☆30,271Jul 17, 2024Updated last year
lm-sys / FastChat
View on GitHub
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
☆39,428Jun 2, 2025Updated 9 months ago
openai / tiktoken
View on GitHub
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
☆17,599Feb 8, 2026Updated last month
guidance-ai / guidance
View on GitHub
A guidance language for controlling large language models.
☆21,346Updated this week
EleutherAI / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of language models.
☆11,704Mar 5, 2026Updated last week
meta-llama / llama
View on GitHub
Inference code for Llama models
☆59,221Jan 26, 2025Updated last year
stanfordnlp / dspy
View on GitHub
DSPy: The framework for programming—not prompting—language models
☆32,853Updated this week
Significant-Gravitas / AutoGPT
View on GitHub
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus o…
☆182,560Updated this week
deepspeedai / DeepSpeed
View on GitHub
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
☆41,807Updated this week
vllm-project / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆73,479Updated this week
yoheinakajima / babyagi
View on GitHub
☆22,187Jan 31, 2026Updated last month
microsoft / autogen
View on GitHub
A programming framework for agentic AI
☆55,559Mar 11, 2026Updated last week
tloen / alpaca-lora
View on GitHub
Instruct-tune LLaMA on consumer hardware
☆18,965Jul 29, 2024Updated last year
microsoft / JARVIS
View on GitHub
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
☆24,571Jul 29, 2025Updated 7 months ago
LAION-AI / Open-Assistant
View on GitHub
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamical…
☆37,433Aug 17, 2024Updated last year
huggingface / trl
View on GitHub
Train transformer language models with reinforcement learning.
☆17,697Updated this week
openai / simple-evals
View on GitHub
☆4,398Jul 31, 2025Updated 7 months ago
huggingface / peft
View on GitHub
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
☆20,809Updated this week
dair-ai / Prompt-Engineering-Guide
View on GitHub
🐙 Guides, papers, lessons, notebooks and resources for prompt engineering, context engineering, RAG, and AI Agents.
☆71,514Mar 11, 2026Updated last week
vibrantlabsai / ragas
View on GitHub
Supercharge Your LLM Application Evaluations 🚀
☆12,927Feb 24, 2026Updated 3 weeks ago
openai / openai-python
View on GitHub
The official Python library for the OpenAI API
☆30,267Updated this week
huggingface / transformers
View on GitHub
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal model…
☆157,783Updated this week
karpathy / nanoGPT
View on GitHub
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆55,030Nov 12, 2025Updated 4 months ago
microsoft / unilm
View on GitHub
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
☆22,046Jan 23, 2026Updated last month
chenfei-wu / TaskMatrix
View on GitHub
☆34,234Jan 6, 2024Updated 2 years ago
ggml-org / llama.cpp
View on GitHub
LLM inference in C/C++
☆98,098Updated this week
ShishirPatil / gorilla
View on GitHub
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
☆12,765Mar 11, 2026Updated last week
huggingface / text-generation-inference
View on GitHub
Large Language Model Text Generation Inference
☆10,803Jan 8, 2026Updated 2 months ago
hpcaitech / ColossalAI
View on GitHub
Making large AI models cheaper, faster and more accessible
☆41,362Updated this week
artidoro / qlora
View on GitHub
QLoRA: Efficient Finetuning of Quantized LLMs
☆10,850Jun 10, 2024Updated last year
zilliztech / GPTCache
View on GitHub
Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
☆7,963Jul 11, 2025Updated 8 months ago
nomic-ai / gpt4all
View on GitHub
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
☆77,226May 27, 2025Updated 9 months ago
BerriAI / litellm
View on GitHub
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing a…
☆38,879Updated this week
Dao-AILab / flash-attention
View on GitHub
Fast and memory-efficient exact attention
☆22,832Updated this week
openai / swarm
View on GitHub
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
☆21,133Mar 11, 2025Updated last year