hkust-nlp / AgentBoard
An Analytical Evaluation Board of Multi-turn LLM Agents
☆243Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for AgentBoard
- [ACL 2024] AUTOACT: Automatic Agent Learning from Scratch for QA via Self-Planning☆177Updated 3 weeks ago
- SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks☆277Updated 2 weeks ago
- ☆116Updated 5 months ago
- Codes for our paper "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate"☆230Updated 3 weeks ago
- An extensible benchmark for evaluating large language models on planning☆288Updated 5 months ago
- FireAct: Toward Language Agent Fine-tuning☆254Updated last year
- Generative Judge for Evaluating Alignment☆216Updated 9 months ago
- RewardBench: the first evaluation tool for reward models.☆424Updated 2 weeks ago
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - NeurIPS 2023 (oral)☆233Updated 6 months ago
- ☆283Updated last month
- ToolBench, an evaluation suite for LLM tool manipulation capabilities.☆143Updated 8 months ago
- Implementation of "RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation".☆176Updated 5 months ago
- ☆171Updated 6 months ago
- KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents☆171Updated 2 weeks ago
- A large-scale, fine-grained, diverse preference dataset (and models).☆309Updated 10 months ago
- ☆189Updated 2 months ago
- A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.☆112Updated last month
- An implemtation of Everyting of Thoughts (XoT).☆129Updated 8 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆194Updated this week
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆153Updated 7 months ago
- ☆211Updated 3 months ago
- Official Implementation of Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization☆108Updated 5 months ago
- AWM: Agent Workflow Memory☆203Updated last month
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RL☆147Updated this week
- Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)☆196Updated 3 months ago
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆435Updated 7 months ago
- [NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents☆273Updated 2 months ago
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …☆239Updated last year
- VisualWebArena is a benchmark for multimodal agents.☆235Updated last month
- This is the official repo for "PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization". PromptAgen…☆199Updated 3 months ago