xingyaoww/mint-bench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xingyaoww/mint-bench)

xingyaoww / mint-bench

Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Zihan Wang*, Jiateng Liu, Yangyi Chen, Lifan Yuan, Hao Peng and Heng Ji.

☆141

Alternatives and similar repositories for mint-bench

Users that are interested in mint-bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ozyyshr / ShareGPT_investigation
View on GitHub
The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions (EMNLP 2023))
☆13Dec 21, 2023Updated 2 years ago
lifan-yuan / PLMCalibration
View on GitHub
Code for ACL 2023 paper "A Close Look into the Calibration of Pre-trained Language Models"
☆11May 9, 2023Updated 3 years ago
lifan-yuan / CRAFT
View on GitHub
Code for ICLR 2024 paper "CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets"
☆62Jun 3, 2024Updated 2 years ago
yzjiao / On-Demand-IE
View on GitHub
Code and dataset for the emnlp paper titled Instruct and Extract: Instruction Tuning for On-Demand Information Extraction
☆55Jan 2, 2024Updated 2 years ago
magicgh / Self-MAP
View on GitHub
[ACL 2024] On the Multi-turn Instruction Following for Conversational Web Agents
☆16Oct 12, 2024Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
night-chen / ToolQA
View on GitHub
ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …
☆286Aug 19, 2023Updated 2 years ago
didiforgithub / SwarmAgent
View on GitHub
🌟 SwarmAgent: A framework for simulating social group dynamics using multi-agent collaboration, aiding insights into collective behavior…
☆13Dec 5, 2023Updated 2 years ago
Junjie-Ye / ToolEyes
View on GitHub
[COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
☆74May 13, 2025Updated last year
argilla-io / distilabel-spin-dibt
View on GitHub
Repository containing the SPIN experiments on the DIBT 10k ranked prompts
☆24Mar 12, 2024Updated 2 years ago
maszhongming / ParaKnowTransfer
View on GitHub
Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"
☆33May 9, 2024Updated 2 years ago
dunzeng / MORE
View on GitHub
Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignment
☆16Aug 6, 2024Updated last year
iwangjian / pyloader
View on GitHub
🐳 PyLoader: An asynchronous Python dataloader for loading big datasets, supporting PyTorch and TensorFlow 2.x.
☆11Aug 29, 2021Updated 4 years ago
open-compass / T-Eval
View on GitHub
[ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step
☆312Apr 3, 2024Updated 2 years ago
lqtrung1998 / mwp_cot_design
View on GitHub
☆14Oct 11, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ShujinWu-0814 / MACAROON
View on GitHub
Public code repo for EMNLP 2024 Findings paper "MACAROON: Training Vision-Language Models To Be Your Engaged Partners"
☆14Sep 28, 2024Updated last year
liyongqi67 / GCoQA
View on GitHub
☆18Jun 24, 2025Updated last year
lifan-yuan / OOD_NLP
View on GitHub
[NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…
☆37Jun 8, 2023Updated 3 years ago
THUNLP-MT / StableToolBench
View on GitHub
A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.
☆237Apr 15, 2025Updated last year
open-compass / BotChat
View on GitHub
Evaluating LLMs' multi-round chatting capability via assessing conversations generated by two LLM instances.
☆163May 22, 2025Updated last year
chenhongqiao / ToolDec
View on GitHub
Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding
☆31Jan 28, 2024Updated 2 years ago
THUDM / AgentBench
View on GitHub
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
☆3,603Feb 8, 2026Updated 5 months ago
hkust-nlp / AgentBoard
View on GitHub
An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]
☆427May 20, 2024Updated 2 years ago
allenai / lumos
View on GitHub
Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"
☆477Mar 19, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
THUDM / AgentTuning
View on GitHub
AgentTuning: Enabling Generalized Agent Abilities for LLMs
☆1,501Oct 31, 2023Updated 2 years ago
raspberryice / inc-schema
View on GitHub
Code for paper "Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification"
☆17Jul 4, 2023Updated 3 years ago
Leezekun / dialogic
View on GitHub
[EMNLP 2022] Code and data for "Controllable Dialogue Simulation with In-Context Learning"
☆34Feb 22, 2023Updated 3 years ago
NanshineLoong / Self-Evolving-Benchmark
View on GitHub
A framework for evolving and testing question-answering datasets with various models.
☆26Feb 28, 2024Updated 2 years ago
cisnlp / mPLM-Sim
View on GitHub
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
☆11Jan 19, 2024Updated 2 years ago
dxhou / CoAct
View on GitHub
☆32Jul 8, 2024Updated 2 years ago
Gentopia-AI / Gentopia
View on GitHub
Build Hierarchical Autonomous Agents through Config. Collaborative Growth of Specialized Agents.
☆328Nov 27, 2023Updated 2 years ago
allenai / reward-bench
View on GitHub
RewardBench: the first evaluation tool for reward models.
☆727Feb 16, 2026Updated 5 months ago
thunlp / ToolLearningPapers
View on GitHub
☆923Jul 24, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
SALT-NLP / normbank
View on GitHub
Data and code for the paper "NormBank: A Knowledge Bank of Situational Social Norms"
☆34Jul 18, 2023Updated 3 years ago
OpenLemur / Lemur
View on GitHub
[ICLR 2024] Lemur: Open Foundation Models for Language Agents
☆557Oct 28, 2023Updated 2 years ago
wenzhe-li / Self-MoA
View on GitHub
☆17Feb 4, 2025Updated last year
feiyang-k / AutoScale
View on GitHub
Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…
☆14Aug 8, 2025Updated 11 months ago
nuaa-nlp / Multimodality
View on GitHub
☆15Dec 10, 2021Updated 4 years ago
OpenCoder-llm / opc_data_filtering
View on GitHub
Heuristic filtering framework for RefineCode
☆87Mar 13, 2025Updated last year
FreedomIntelligence / GPT-API-Accelerate
View on GitHub
The "GPT-API-Accelerate" project provides a set of Python classes for accelerating the process of generating responses to prompts using t…
☆23Oct 12, 2024Updated last year