MTU-Bench-Team / MTU-Bench
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
☆38Updated this week
Alternatives and similar repositories for MTU-Bench:
Users that are interested in MTU-Bench are comparing it to the libraries listed below
- ☆42Updated 2 months ago
- Hammer: Robust Function-Calling for On-Device Language Models via Function Masking☆58Updated this week
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.☆49Updated 4 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆116Updated 3 months ago
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆65Updated 2 months ago
- The official repository of the Omni-MATH benchmark.☆71Updated last month
- Reformatted Alignment☆114Updated 4 months ago
- Critique-out-Loud Reward Models☆51Updated 4 months ago
- ☆98Updated 2 months ago
- [ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios☆52Updated 10 months ago
- Code implementation of synthetic continued pretraining☆88Updated last month
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆44Updated 7 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Updated 11 months ago
- ☆53Updated 3 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆52Updated 4 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆95Updated 4 months ago
- ☆81Updated 10 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆72Updated 8 months ago
- ☆48Updated 11 months ago
- The demo, code and data of FollowRAG☆69Updated 2 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆135Updated 3 months ago
- ☆45Updated 4 months ago
- ☆58Updated 5 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆44Updated last month
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆154Updated last month
- [ICLR 2025] SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights☆53Updated last week
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated 11 months ago
- [ICLR 2025] InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales☆72Updated 2 weeks ago