sambanova/toolbench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sambanova/toolbench)

sambanova / toolbench

ToolBench, an evaluation suite for LLM tool manipulation capabilities.

☆180

Alternatives and similar repositories for toolbench

Users that are interested in toolbench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sambanova / generative_data_prep
View on GitHub
☆67Feb 4, 2026Updated 5 months ago
HowieHwong / MetaTool
View on GitHub
[ICLR'24] MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use
☆115Mar 21, 2024Updated 2 years ago
Junjie-Ye / ToolEyes
View on GitHub
[COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
☆74May 13, 2025Updated last year
OpenBMB / ToolBench
View on GitHub
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
☆5,708May 21, 2025Updated last year
thunlp / ToolLearningPapers
View on GitHub
☆923Jul 24, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
THUNLP-MT / StableToolBench
View on GitHub
A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.
☆237Apr 15, 2025Updated last year
qiancheng0 / CREATOR
View on GitHub
This is the repository for paper "CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models"
☆31Oct 8, 2023Updated 2 years ago
Ber666 / ToolkenGPT
View on GitHub
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - NeurIPS 2023 (oral)
☆271Apr 18, 2024Updated 2 years ago
open-compass / T-Eval
View on GitHub
[ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step
☆312Apr 3, 2024Updated 2 years ago
kevinyaobytedance / llm_eval
View on GitHub
LLM evaluation.
☆16Nov 7, 2023Updated 2 years ago
frt03 / jax_dt
View on GitHub
Minimal Decision Transformer Implementation written in Jax (Flax).
☆18Aug 8, 2022Updated 3 years ago
THUDM / AgentBench
View on GitHub
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
☆3,601Feb 8, 2026Updated 5 months ago
thunlp / Knowledge-Inheritance
View on GitHub
Source code for paper: Knowledge Inheritance for Pre-trained Language Models
☆37Apr 24, 2022Updated 4 years ago
hhan1018 / NesTools
View on GitHub
[COLING 2025] NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models
☆18Jan 18, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
GAIR-NLP / scaleeval
View on GitHub
Scalable Meta-Evaluation of LLMs as Evaluators
☆43Feb 15, 2024Updated 2 years ago
xlang-ai / xlang-paper-reading
View on GitHub
Paper collection on building and evaluating language model agents via executable language grounding
☆364Apr 29, 2024Updated 2 years ago
magicgh / Self-MAP
View on GitHub
[ACL 2024] On the Multi-turn Instruction Following for Conversational Web Agents
☆16Oct 12, 2024Updated last year
AlongWY / gpustat
View on GitHub
📊 A simple command-line utility for querying and monitoring GPU status
☆14Aug 3, 2023Updated 2 years ago
thunlp / ERICA
View on GitHub
Source code for ACL 2021 paper "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learni…
☆85May 26, 2021Updated 5 years ago
THUDM / ChatGLM-Math
View on GitHub
☆82Apr 18, 2024Updated 2 years ago
JoeYing1019 / UltraTool
View on GitHub
[ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios
☆71Aug 5, 2025Updated 11 months ago
tangqiaoyu / ToolAlpaca
View on GitHub
the official code for "ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases"
☆880Oct 26, 2024Updated last year
cognitiveailab / GPT-simulator
View on GitHub
☆33Jun 12, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Tomiinek / Aargh
View on GitHub
☆12Jan 2, 2024Updated 2 years ago
quchangle1 / LLM-Tool-Survey
View on GitHub
This is the repository for the Tool Learning survey.
☆485Aug 9, 2025Updated 11 months ago
da03 / WildVisualizer
View on GitHub
☆28Nov 19, 2025Updated 8 months ago
FranxYao / FlanT5-CoT-Specialization
View on GitHub
Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.
☆131Jun 18, 2023Updated 3 years ago
YuxiXie / SelfEval-Guided-Decoding
View on GitHub
☆103Dec 7, 2023Updated 2 years ago
RUCAIBox / JiuZhang3.0
View on GitHub
The code and data for the paper JiuZhang3.0
☆49May 26, 2024Updated 2 years ago
casmlab / NPHardEval
View on GitHub
Repository for NPHardEval, a quantified-dynamic benchmark of LLMs
☆64Mar 26, 2024Updated 2 years ago
ExpressAI / reStructured-Pretraining
View on GitHub
reStructured Pre-training
☆99Dec 22, 2022Updated 3 years ago
night-chen / ToolQA
View on GitHub
ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …
☆286Aug 19, 2023Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
hengyicai / Adaptive_Multi-curricula_Learning_for_Dialog
View on GitHub
The codebase for "Learning from Easy to Complex: Adaptive Multi-curricula Learning for Neural Dialogue Generation" (Cai et al., AAAI 2020…
☆20Jun 18, 2024Updated 2 years ago
qinlibo-hit / Retriever-Dialogue
View on GitHub
end-to-end dialog system dataset
☆13Sep 15, 2019Updated 6 years ago
evasharma / bigpatent
View on GitHub
☆25Jun 25, 2019Updated 7 years ago
yiqingxyq / RepoST
View on GitHub
Code for "[COLM'25] RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing"
☆24Mar 18, 2025Updated last year
ShishirPatil / gorilla
View on GitHub
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
☆12,961Apr 13, 2026Updated 3 months ago
RickySkywalker / LeanOfThought-Official
View on GitHub
This is the official implementation for MA-LoT.
☆20Aug 4, 2025Updated 11 months ago
liujch1998 / rainier
View on GitHub
☆29Feb 17, 2024Updated 2 years ago