microsoft/ToolTalk

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/microsoft/ToolTalk)

microsoft / ToolTalk

Evaluating tool-augmented LLMs in conversation settings

☆89

Alternatives and similar repositories for ToolTalk

Users that are interested in ToolTalk are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

IBM / API-BLEND
View on GitHub
Companion code to https://arxiv.org/abs/2402.15491
☆22Sep 18, 2025Updated 10 months ago
NEUIR / P3Ranker
View on GitHub
[SIGIR '22] Code for our SIGIR 2022 accepted paper : P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Pr…
☆18Sep 24, 2023Updated 2 years ago
HowieHwong / MetaTool
View on GitHub
[ICLR'24] MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use
☆115Mar 21, 2024Updated 2 years ago
RAIVNLab / mnms
View on GitHub
m&ms: A Benchmark to Evaluate Tool-Use for multi-step multi-modal tasks
☆46Sep 26, 2024Updated last year
thunlp / ToolLearningPapers
View on GitHub
☆922Jul 24, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Lukeming-tsinghua / Instruction-Tuning-for-Open-world-IE
View on GitHub
☆21May 22, 2023Updated 3 years ago
PengjieRen / CaSE_RG
View on GitHub
Conversations with Search Engines
☆14Jun 12, 2023Updated 3 years ago
JoeYing1019 / UltraTool
View on GitHub
[ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios
☆71Aug 5, 2025Updated 11 months ago
THUNLP-MT / StableToolBench
View on GitHub
A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.
☆237Apr 15, 2025Updated last year
facebookresearch / ToolVerifier
View on GitHub
This repository contains the ToolSelect dataset which was used to fine-tune Llama-2 70B for tool selection.
☆23Mar 11, 2024Updated 2 years ago
quchangle1 / COLT
View on GitHub
The implementation for CIKM 2024: Towards Completeness-Oriented Tool Retrieval for Large Language Models.
☆26Nov 6, 2024Updated last year
SalesforceAIResearch / AgentLite
View on GitHub
☆648Jun 2, 2026Updated last month
SivilTaram / FollowUp
View on GitHub
public dataset for followup-query analysis, accepted by AAAI2019
☆15Aug 22, 2019Updated 6 years ago
nickvosk / sigir2020-query-resolution
View on GitHub
☆13Jul 25, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
chuzhumin98 / ConvSearch-Dataset
View on GitHub
The homepage for ConvSearch Dataset.
☆14May 31, 2022Updated 4 years ago
ChuanMeng / Conversational-Information-Seeking
View on GitHub
A Conversational Information Seeking (CIS) Paper Reading List Maintained by Chuan Meng.
☆29Sep 27, 2022Updated 3 years ago
allenai / lumos
View on GitHub
Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"
☆478Mar 19, 2024Updated 2 years ago
MadeAgents / Hammer
View on GitHub
Hammer: Robust Function-Calling for On-Device Language Models via Function Masking
☆120Jun 13, 2025Updated last year
martin-wey / CodeUltraFeedback
View on GitHub
CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)
☆76Jun 25, 2024Updated 2 years ago
noanabeshima / github-downloader
View on GitHub
Script for downloading GitHub.
☆13Sep 24, 2020Updated 5 years ago
adihaviv / nopos
View on GitHub
☆23Jul 27, 2023Updated 2 years ago
Ber666 / ToolkenGPT
View on GitHub
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - NeurIPS 2023 (oral)
☆271Apr 18, 2024Updated 2 years ago
liyongqi67 / GCoQA
View on GitHub
☆18Jun 24, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
WorldEditors / PostKS
View on GitHub
☆11May 26, 2020Updated 6 years ago
guxd / DialogBERT
View on GitHub
Source Code for DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances (https://arxiv.org/pdf/2012.0…
☆79Jan 2, 2022Updated 4 years ago
styfeng / GenAug
View on GitHub
Code for GenAug: Data Augmentation for Finetuning Text Generators.
☆28Oct 8, 2021Updated 4 years ago
ielab / llm-qlm
View on GitHub
Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking
☆17Oct 26, 2023Updated 2 years ago
ARiSE-Lab / CYCLE_OOPSLA_24
View on GitHub
Open-source repository for the OOPSLA'24 paper "CYCLE: Learning to Self-Refine Code Generation"
☆10Mar 8, 2024Updated 2 years ago
sneakers-the-rat / dissertation
View on GitHub
my dissertation!
☆12Sep 6, 2022Updated 3 years ago
iai-group / UserSimCRS
View on GitHub
Conversational Recommender System Evaluation via Simulation
☆22Jul 14, 2026Updated last week
night-chen / ToolQA
View on GitHub
ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …
☆286Aug 19, 2023Updated 2 years ago
salesforce / BOLAA
View on GitHub
☆192Jun 2, 2026Updated last month
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
The-Swarm-Corporation / swarms-core
View on GitHub
Multi-threading, Concurrency, Asynchrony, and various Execution Methods implemented in a Rust backend for bleeding edge performance.
☆20Nov 11, 2024Updated last year
google-research-datasets / seq2act
View on GitHub
This repository contains the opensource version of the datasets were used for different parts of training and testing of models that grou…
☆35Aug 20, 2020Updated 5 years ago
RL10x / RetNet
View on GitHub
an implementation of paper"Retentive Network: A Successor to Transformer for Large Language Models" https://arxiv.org/pdf/2307.08621.pdf
☆11Jul 25, 2023Updated 2 years ago
awslabs / pptod
View on GitHub
Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System (ACL 2022)
☆162Dec 20, 2023Updated 2 years ago
magicgh / Ask-before-Plan
View on GitHub
[EMNLP 2024] Ask-before-Plan: Proactive Language Agents for Real-World Planning
☆24Jul 28, 2025Updated 11 months ago
vikas95 / AIR-retriever
View on GitHub
AIR retriever for Multi-Hop QA (ACL 2020 paper)
☆30Jul 18, 2020Updated 6 years ago
XueyangFeng / ReHAC
View on GitHub
Repo of "Large Language Model-based Human-Agent Collaboration for Complex Task Solving(EMNLP2024 Findings)"
☆34Sep 20, 2024Updated last year