hkust-nlp/AgentBoard

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hkust-nlp/AgentBoard)

hkust-nlp / AgentBoard

An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]

☆427

Alternatives and similar repositories for AgentBoard

Users that are interested in AgentBoard are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

THUDM / AgentBench
View on GitHub
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
☆3,603Feb 8, 2026Updated 5 months ago
Yifan-Song793 / ETO
View on GitHub
Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)
☆168Oct 30, 2024Updated last year
zjunlp / AutoAct
View on GitHub
[ACL 2024] AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning
☆238Jan 13, 2025Updated last year
zhao-ht / LearnAct
View on GitHub
Code for paper Empowering Large Language Model Agents through Action Learning
☆34Aug 8, 2024Updated last year
LZhengisme / self-infilling
View on GitHub
[ICML 2024] Self-Infilling Code Generation
☆18May 5, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
hkust-nlp / GUIMid
View on GitHub
☆22May 3, 2025Updated last year
LeapLabTHU / ExpeL
View on GitHub
☆228Dec 20, 2024Updated last year
kohjingyu / search-agents
View on GitHub
Code for the paper 🌳 Tree Search for Language Model Agents
☆223Jul 25, 2024Updated 2 years ago
alfworld / alfworld
View on GitHub
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
☆810Feb 8, 2026Updated 5 months ago
web-arena-x / webarena
View on GitHub
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
☆1,557Nov 26, 2025Updated 8 months ago
Timothyxxx / KVCachePapers
View on GitHub
☆20May 24, 2024Updated 2 years ago
WooooDyy / AgentGym
View on GitHub
Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhi…
☆817May 30, 2026Updated last month
chang-github-00 / LLM-Predictive-Decoding
View on GitHub
☆16Jul 9, 2025Updated last year
OpenLemur / Lemur
View on GitHub
[ICLR 2024] Lemur: Open Foundation Models for Language Agents
☆557Oct 28, 2023Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
SalesforceAIResearch / xLAM
View on GitHub
xLAM: A Family of Large Action Models to Empower AI Agent Systems
☆636Jun 2, 2026Updated last month
SalesforceAIResearch / AgentLite
View on GitHub
☆648Jun 2, 2026Updated last month
princeton-nlp / WebShop
View on GitHub
[NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
☆572Sep 6, 2024Updated last year
Berkeley-NLP / Agent-Eval-Refine
View on GitHub
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]
☆149Nov 26, 2024Updated last year
anchen1011 / FireAct
View on GitHub
FireAct: Toward Language Agent Fine-tuning
☆296Oct 22, 2023Updated 2 years ago
ysymyth / awesome-language-agents
View on GitHub
List of language agents based on paper "Cognitive Architectures for Language Agents"
☆1,247Jan 16, 2025Updated last year
OpenBMB / ToolBench
View on GitHub
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
☆5,709May 21, 2025Updated last year
THUDM / AgentTuning
View on GitHub
AgentTuning: Enabling Generalized Agent Abilities for LLMs
☆1,501Oct 31, 2023Updated 2 years ago
allenai / ScienceWorld
View on GitHub
ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.
☆370Dec 3, 2025Updated 7 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
xingyaoww / mint-bench
View on GitHub
Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…
☆141Jun 4, 2024Updated 2 years ago
salesforce / BOLAA
View on GitHub
☆192Jun 2, 2026Updated last month
neulab / MultiUI
View on GitHub
Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding
☆54Dec 12, 2024Updated last year
hrwise-nlp / ToolsMeetLLMs
View on GitHub
☆33May 8, 2025Updated last year
weirayao / Retroformer
View on GitHub
☆39May 2, 2024Updated 2 years ago
THUNLP-MT / StableToolBench
View on GitHub
A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.
☆237Apr 15, 2025Updated last year
maitrix-org / llm-reasoners
View on GitHub
A library for advanced large language model reasoning
☆2,341Jun 10, 2025Updated last year
zjunlp / LLMAgentPapers
View on GitHub
Must-read Papers on LLM Agents.
☆3,089Jul 5, 2026Updated 3 weeks ago
princeton-nlp / intercode
View on GitHub
[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898
☆253May 5, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
hrwise-nlp / AppBench
View on GitHub
This is for EMNLP 2024 Paper: AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction
☆16Nov 4, 2024Updated last year
YifeiZhou02 / ArCHer
View on GitHub
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆208Apr 17, 2025Updated last year
Junjie-Ye / ToolEyes
View on GitHub
[COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
☆74May 13, 2025Updated last year
InternLM / Agent-FLAN
View on GitHub
[ACL2024 Findings] Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
☆361Mar 22, 2024Updated 2 years ago
sierra-research / tau-bench
View on GitHub
Code and Data for Tau-Bench
☆1,345Mar 18, 2026Updated 4 months ago
hyp1231 / awesome-llm-powered-agent
View on GitHub
Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...
☆2,251Apr 30, 2025Updated last year
hkust-nlp / B-STaR
View on GitHub
B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
☆86May 21, 2025Updated last year