THUDM/AgentBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/THUDM/AgentBench)

THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

☆3,582

Alternatives and similar repositories for AgentBench

Users that are interested in AgentBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

THUDM / AgentTuning
View on GitHub
AgentTuning: Enabling Generalized Agent Abilities for LLMs
☆1,499Oct 31, 2023Updated 2 years ago
OpenBMB / ToolBench
View on GitHub
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
☆5,701May 21, 2025Updated last year
Paitesanshi / LLM-Agent-Survey
View on GitHub
☆2,908Feb 20, 2025Updated last year
WooooDyy / LLM-Agent-Paper-List
View on GitHub
The paper list of the 86-page SCIS cover paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et a…
☆8,166Sep 12, 2025Updated 10 months ago
THUDM / VisualAgentBench
View on GitHub
Towards Large Multimodal Models as Visual Foundation Agents
☆270Apr 24, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
web-arena-x / webarena
View on GitHub
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
☆1,547Nov 26, 2025Updated 7 months ago
aiwaves-cn / agents
View on GitHub
An Open-source Framework for Data-centric, Self-evolving Autonomous Language Agents
☆5,946Sep 26, 2024Updated last year
hkust-nlp / AgentBoard
View on GitHub
An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]
☆425May 20, 2024Updated 2 years ago
FranxYao / chain-of-thought-hub
View on GitHub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
☆2,777Aug 4, 2024Updated last year
OSU-NLP-Group / Mind2Web
View on GitHub
[NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web" -- the first LLM-based web agent and benchmark for generalist w…
☆1,015Nov 5, 2025Updated 8 months ago
huggingface / trl
View on GitHub
Train transformer language models with reinforcement learning.
☆18,878Updated this week
open-compass / opencompass
View on GitHub
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, …
☆7,208Updated this week
EleutherAI / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of language models.
☆13,336Updated this week
lm-sys / FastChat
View on GitHub
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
☆39,491May 1, 2026Updated 2 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
xlang-ai / OpenAgents
View on GitHub
[COLM 2024] OpenAgents: An Open Platform for Language Agents in the Wild
☆4,848Nov 18, 2024Updated last year
ShishirPatil / gorilla
View on GitHub
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
☆12,953Apr 13, 2026Updated 3 months ago
tatsu-lab / alpaca_eval
View on GitHub
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
☆2,004Aug 9, 2025Updated 11 months ago
nlpxucan / WizardLM
View on GitHub
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
☆9,479Jun 7, 2025Updated last year
princeton-nlp / tree-of-thought-llm
View on GitHub
[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
☆6,029Jan 16, 2025Updated last year
OpenRLHF / OpenRLHF
View on GitHub
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…
☆9,821Updated this week
noahshinn / reflexion
View on GitHub
[NeurIPS 2023] Reflexion: Language Agents with Verbal Reinforcement Learning
☆3,207Jan 14, 2025Updated last year
alfworld / alfworld
View on GitHub
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
☆807Feb 8, 2026Updated 5 months ago
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,542Updated this week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
ysymyth / ReAct
View on GitHub
[ICLR 2023] ReAct: Synergizing Reasoning and Acting in Language Models
☆4,059Feb 6, 2024Updated 2 years ago
camel-ai / camel
View on GitHub
🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org
☆17,421Updated this week
yizhongw / self-instruct
View on GitHub
Aligning pretrained language models with instruction data generated by themselves.
☆4,606Mar 27, 2023Updated 3 years ago
hyp1231 / awesome-llm-powered-agent
View on GitHub
Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...
☆2,249Apr 30, 2025Updated last year
maitrix-org / llm-reasoners
View on GitHub
A library for advanced large language model reasoning
☆2,339Jun 10, 2025Updated last year
princeton-nlp / WebShop
View on GitHub
[NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
☆571Sep 6, 2024Updated last year
OpenBMB / AgentVerse
View on GitHub
🤖 AgentVerse 🪐 is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides …
☆5,081Sep 9, 2024Updated last year
zjunlp / LLMAgentPapers
View on GitHub
Must-read Papers on LLM Agents.
☆3,083Jul 5, 2026Updated 2 weeks ago
openai / prm800k
View on GitHub
800,000 step-level correctness labels on LLM solutions to MATH problems
☆2,150Jun 1, 2023Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
vllm-project / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆86,634Updated this week
THUDM / LongBench
View on GitHub
LongBench v2 and LongBench (ACL 25'&24')
☆1,212Jan 15, 2025Updated last year
OpenGVLab / LLaMA-Adapter
View on GitHub
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
☆5,916Mar 14, 2024Updated 2 years ago
OpenBMB / XAgent
View on GitHub
An Autonomous LLM Agent for Complex Task Solving
☆8,525Aug 12, 2024Updated last year
thunlp / ToolLearningPapers
View on GitHub
☆922Jul 24, 2024Updated last year
CarperAI / trlx
View on GitHub
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
☆4,752Jan 8, 2024Updated 2 years ago
mit-han-lab / streaming-llm
View on GitHub
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
☆7,248Jul 11, 2024Updated 2 years ago