JoeYing1019/UltraTool

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/JoeYing1019/UltraTool)

JoeYing1019 / UltraTool

[ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios

☆71

Alternatives and similar repositories for UltraTool

Users that are interested in UltraTool are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

JoeYing1019 / SDIF-DA
View on GitHub
[ICASSP2024] Code for paper "SDIF-DA: A Shallow-to-Deep Interaction Framework with Data Augmentation for Multi-modal Intent Detection"
☆16Jul 6, 2024Updated 2 years ago
HarlynDN / WebCiteS
View on GitHub
[ACL'24] WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations
☆13Sep 11, 2024Updated last year
HowieHwong / MetaTool
View on GitHub
[ICLR'24] MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use
☆115Mar 21, 2024Updated 2 years ago
Junjie-Ye / ToolEyes
View on GitHub
[COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
☆74May 13, 2025Updated last year
facebookresearch / ToolVerifier
View on GitHub
This repository contains the ToolSelect dataset which was used to fine-tune Llama-2 70B for tool selection.
☆23Mar 11, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
hhan1018 / NesTools
View on GitHub
[COLING 2025] NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models
☆18Jan 18, 2025Updated last year
WxxShirley / KDD2024ProCom
View on GitHub
Codes and data for KDD 2024 Research Track paper "ProCom: A Few-shot Targeted Community Detection Algorithm"
☆11Aug 15, 2024Updated last year
hrwise-nlp / ToolsMeetLLMs
View on GitHub
☆33May 8, 2025Updated last year
AngxiaoYue / awesome-llm-tool-learning
View on GitHub
A list of awesome papers on LLM tool learning.
☆28Jul 24, 2024Updated 2 years ago
metacarbon / shareAtt
View on GitHub
Beyond KV Caching: Shared Attention for Efficient LLMs
☆20Jul 19, 2024Updated 2 years ago
shuangyulin / ssm_familycash
View on GitHub
JSP基于SSM家庭财务收支管理系统
☆10May 10, 2023Updated 3 years ago
SCNU203 / GeoQA-Plus
View on GitHub
☆20May 14, 2024Updated 2 years ago
Fantabulous-J / coref-HGAT
View on GitHub
Pytorch Implementation of Our NAACL 2021 Paper "Incorporating Syntax and Semantics in Coreference Resolution with Heterogeneous Graph Att…
☆10Apr 28, 2022Updated 4 years ago
Jiaxin-Pei / Potato-Prolific-Dataset
View on GitHub
☆17Jun 14, 2023Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Some-random / theorem-proving-reasoning
View on GitHub
Code for the paper LeanReasoner: Boosting Complex Logical Reasoning with Lean: https://arxiv.org/pdf/2403.13312.pdf
☆27May 25, 2024Updated 2 years ago
microsoft / simulated-trial-and-error
View on GitHub
☆124Jun 6, 2024Updated 2 years ago
Yuanhy1997 / Auto-Diagnosis-by-RL-and-Classification
View on GitHub
Efficient Symptom Inquiring and Diagnosis via Adaptive Alignment of Reinforcement Learning and Classification [AI in Medicine Journal]
☆14May 20, 2022Updated 4 years ago
NExTplusplus / L2I
View on GitHub
The baseline method for CCIR 22 https://www.datafountain.cn/competitions/573
☆13Aug 2, 2022Updated 3 years ago
AnWang-AI / AugABSA
View on GitHub
This repository contains codes for *Sem 2023 paper “Generative Data Augmentation for Aspect Sentiment Quad Prediction”.
☆10May 30, 2023Updated 3 years ago
fairyshine / Seal-Tools
View on GitHub
The source code and dataset mentioned in the paper Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmar…
☆57Nov 5, 2024Updated last year
SALT-NLP / PopupAttack
View on GitHub
Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups
☆51Dec 23, 2024Updated last year
thunlp / ToolLearningPapers
View on GitHub
☆923Jul 24, 2024Updated 2 years ago
google-research-datasets / rico_semantics
View on GitHub
Consists of ~500k human annotations on the RICO dataset identifying various icons based on their shapes and semantics, and associations b…
☆36Jun 27, 2024Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
brucewsy / AD-KD
View on GitHub
Source code of ACL 2023 accepted paper "AD-KD: Attribution-Driven Knowledge Distillation for Language Model Compression"
☆13Jun 14, 2023Updated 3 years ago
open-compass / GTA
View on GitHub
[NeurIPS 2024 D&B] GTA: A Benchmark for General Tool Agents & [arXiv 2026] GTA-2
☆147Apr 20, 2026Updated 3 months ago
JiaWeiSii / gorgeous
View on GitHub
https://jiaweisii.github.io/gorgeous/
☆18Feb 24, 2026Updated 5 months ago
SalesforceAIResearch / FoFo
View on GitHub
☆27Jun 2, 2026Updated last month
JollyHe / FanacialSys
View on GitHub
家庭财务管理系统源码全
☆16Aug 13, 2017Updated 8 years ago
salesforce / BOLAA
View on GitHub
☆192Jun 2, 2026Updated last month
NanshineLoong / Self-Evolving-Benchmark
View on GitHub
A framework for evolving and testing question-answering datasets with various models.
☆26Feb 28, 2024Updated 2 years ago
hkust-nlp / AgentBoard
View on GitHub
An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]
☆427May 20, 2024Updated 2 years ago
passing2961 / Stark
View on GitHub
Official code and dataset for our EMNLP 2024 Findings paper: Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Kn…
☆19Dec 27, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Reason-Wang / NAT
View on GitHub
[NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…
☆28Mar 14, 2024Updated 2 years ago
dyabel / AnyTool
View on GitHub
☆318Mar 26, 2024Updated 2 years ago
SCIR-SC-Qiaoban-Team / FreeEvalLM
View on GitHub
[AAAI26] Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilitie…
☆11Feb 7, 2026Updated 5 months ago
LinkAnonymous / BESA
View on GitHub
☆12Oct 9, 2023Updated 2 years ago
albertwy / GPT-4V-Evaluation
View on GitHub
Data for evaluating GPT-4V
☆11Oct 26, 2023Updated 2 years ago
oriyor / turning_tables
View on GitHub
Implementation of the paper: "Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning…
☆22Nov 2, 2021Updated 4 years ago
tangqiaoyu / ToolAlpaca
View on GitHub
the official code for "ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases"
☆879Oct 26, 2024Updated last year