open-compass/GTA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/open-compass/GTA)

open-compass / GTA

[NeurIPS 2024 D&B] GTA: A Benchmark for General Tool Agents & [arXiv 2026] GTA-2

☆147

Alternatives and similar repositories for GTA

Users that are interested in GTA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Zhudongsheng75 / Divide-Then-Aggregate
View on GitHub
(ACL 2025) Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation
☆12May 21, 2025Updated last year
hhan1018 / NesTools
View on GitHub
[COLING 2025] NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models
☆18Jan 18, 2025Updated last year
Fugtemypt123 / ToolVQA-release
View on GitHub
Codebase for paper ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools
☆31Nov 3, 2025Updated 8 months ago
THUNLP-MT / StableToolBench
View on GitHub
A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.
☆237Apr 15, 2025Updated last year
quchangle1 / LLM-Tool-Survey
View on GitHub
This is the repository for the Tool Learning survey.
☆485Aug 9, 2025Updated 11 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
YuxiangChai / AMEX-codebase
View on GitHub
☆33Sep 27, 2024Updated last year
cxcscmu / General-AgentBench
View on GitHub
Benchmark Test-Time Scaling of General LLM Agents
☆20Apr 14, 2026Updated 3 months ago
WooooDyy / AgentGym
View on GitHub
Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhi…
☆816May 30, 2026Updated last month
JoeYing1019 / UltraTool
View on GitHub
[ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios
☆71Aug 5, 2025Updated 11 months ago
HowieHwong / MetaTool
View on GitHub
[ICLR'24] MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use
☆115Mar 21, 2024Updated 2 years ago
yuyq18 / StepTool
View on GitHub
☆36May 24, 2025Updated last year
StonyBrookNLP / appworld
View on GitHub
🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource…
☆468Feb 17, 2026Updated 5 months ago
sierra-research / tau-bench
View on GitHub
Code and Data for Tau-Bench
☆1,342Mar 18, 2026Updated 4 months ago
zorazrw / awesome-tool-llm
View on GitHub
☆248Aug 14, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
XiangLi1999 / AutoBencher
View on GitHub
☆33Jul 11, 2024Updated 2 years ago
hkust-nlp / AgentBoard
View on GitHub
An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]
☆427May 20, 2024Updated 2 years ago
zjunlp / WorfBench
View on GitHub
[ICLR 2025] Benchmarking Agentic Workflow Generation
☆155Feb 19, 2025Updated last year
SparksJoe / Prism
View on GitHub
A Framework for Decoupling and Assessing the Capabilities of VLMs
☆44Jun 28, 2024Updated 2 years ago
imagination-research / LCSC
View on GitHub
[ICLR 2025] Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better
☆16Feb 15, 2025Updated last year
OSU-NLP-Group / TravelPlanner
View on GitHub
[ICML'24 Spotlight] "TravelPlanner: A Benchmark for Real-World Planning with Language Agents"
☆529May 24, 2026Updated last month
IBM / API-BLEND
View on GitHub
Companion code to https://arxiv.org/abs/2402.15491
☆22Sep 18, 2025Updated 10 months ago
open-compass / MathBench
View on GitHub
[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset
☆115May 22, 2025Updated last year
uvavision / SyViC
View on GitHub
[ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data
☆13Sep 30, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
SalesforceAIResearch / swecomm
View on GitHub
☆28Jun 2, 2026Updated last month
modelscope / MCPBench
View on GitHub
The evaluation benchmark on MCP servers
☆250Sep 3, 2025Updated 10 months ago
IBM / NESTFUL
View on GitHub
Companion code to https://arxiv.org/abs/2409.03797v2
☆19Sep 18, 2025Updated 10 months ago
sanjibanc / agent_prm
View on GitHub
☆60Feb 19, 2025Updated last year
princeton-nlp / WebShop
View on GitHub
[NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
☆572Sep 6, 2024Updated last year
kaist-ami / BEAF
View on GitHub
[ECCV’24] Official repository for "BEAF: Observing Before-AFter Changes to Evaluate Hallucination in Vision-language Models"
☆22Mar 26, 2025Updated last year
caoyxuan / W2PGNN
View on GitHub
code for kdd feasibiiity
☆12Jul 17, 2023Updated 3 years ago
MLLM-Data-Contamination / MM-Detect
View on GitHub
Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM | EMNLP 2025 Findings
☆18Oct 17, 2025Updated 9 months ago
ernie-research / Tool-Augmented-Reward-Model
View on GitHub
[ICLR'24 spotlight] Tool-Augmented Reward Modeling
☆54Jun 6, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ModalMinds / gym-v
View on GitHub
A unified framework for vision-language environments with Gymnasium-compatible interface
☆35Mar 17, 2026Updated 4 months ago
SkyworkAI / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆17Jun 3, 2024Updated 2 years ago
zjunlp / TRICE
View on GitHub
[NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback
☆43Mar 14, 2024Updated 2 years ago
qiancheng0 / ToolRL
View on GitHub
☆513Oct 16, 2025Updated 9 months ago
THUDM / AgentBench
View on GitHub
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
☆3,597Feb 8, 2026Updated 5 months ago
THUDM / WebRL
View on GitHub
Building Open LLM Web Agents with Self-Evolving Online Curriculum RL
☆535Jun 6, 2025Updated last year
facebookresearch / MetaEmbed
View on GitHub
[ICLR 2026 Oral] Official Implementation of the paper "MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interactio…
☆18Jul 2, 2026Updated 3 weeks ago