ToolBench, an evaluation suite for LLM tool manipulation capabilities.
☆172Feb 28, 2024Updated 2 years ago
Alternatives and similar repositories for toolbench
Users that are interested in toolbench are comparing it to the libraries listed below
Sorting:
- [ICLR 2024] MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use☆109Mar 21, 2024Updated last year
- ☆31Jun 12, 2024Updated last year
- ☆917Jul 24, 2024Updated last year
- ☆66Feb 4, 2026Updated last month
- [ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.☆5,536May 21, 2025Updated 9 months ago
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆73May 13, 2025Updated 9 months ago
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 8 months ago
- ☆25Nov 19, 2025Updated 3 months ago
- [NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…☆15Feb 4, 2025Updated last year
- This is the repository for paper "CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models"☆29Oct 8, 2023Updated 2 years ago
- Official repository of Graph RAG-Tool Fusion and ToolLinkOS dataset.☆22Feb 13, 2025Updated last year
- [COLING 2025] NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models☆18Jan 18, 2025Updated last year
- A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.☆217Apr 15, 2025Updated 10 months ago
- [ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step☆304Apr 3, 2024Updated last year
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - NeurIPS 2023 (oral)☆270Apr 18, 2024Updated last year
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆3,187Feb 8, 2026Updated 3 weeks ago
- This repository contains the ToolSelect dataset which was used to fine-tune Llama-2 70B for tool selection.☆22Mar 11, 2024Updated last year
- LLM evaluation.☆16Nov 7, 2023Updated 2 years ago
- Paper collection on building and evaluating language model agents via executable language grounding☆365Apr 29, 2024Updated last year
- Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.☆132Jun 18, 2023Updated 2 years ago
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks☆20May 10, 2022Updated 3 years ago
- Code for "[COLM'25] RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing"☆23Mar 18, 2025Updated 11 months ago
- The codebase for "Learning from Easy to Complex: Adaptive Multi-curricula Learning for Neural Dialogue Generation" (Cai et al., AAAI 2020…☆20Jun 18, 2024Updated last year
- ☆83Apr 18, 2024Updated last year
- This is code for the EMNLP 2022 Paper "UniRPG: Unified Discrete Reasoning over Table and Text as Program Generation".☆10Apr 30, 2023Updated 2 years ago
- This is the official implementation for MA-LoT.☆19Aug 4, 2025Updated 7 months ago
- This is the repository for the Tool Learning survey.☆481Aug 9, 2025Updated 6 months ago
- the official code for "ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases"☆884Oct 26, 2024Updated last year
- SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning (NeurIPS D&B Track 2024)☆86Feb 25, 2024Updated 2 years ago
- Source code for paper: Knowledge Inheritance for Pre-trained Language Models☆38Apr 24, 2022Updated 3 years ago
- Codes for paper SoAy: A Service-oriented APIs Applying Framework of Large Language Models☆27Jul 14, 2025Updated 7 months ago
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆184May 20, 2025Updated 9 months ago
- codes for ICML2021 paper iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients☆10May 27, 2021Updated 4 years ago
- Code of LeCoRE☆13Feb 15, 2023Updated 3 years ago
- ☆12Mar 5, 2025Updated 11 months ago
- AgentsCourt: Building Judicial Decision-Making Agents with Court Debate Simulation and Legal Knowledge Augmentation (EMNLP 2024 Findings)☆15Dec 30, 2024Updated last year
- Scalable Meta-Evaluation of LLMs as Evaluators☆43Feb 15, 2024Updated 2 years ago
- Memory experiments with LLMs☆11Mar 31, 2023Updated 2 years ago