zai-org/ComplexFuncBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zai-org/ComplexFuncBench)

zai-org / ComplexFuncBench

Complex Function Calling Benchmark.

☆165

Alternatives and similar repositories for ComplexFuncBench

Users that are interested in ComplexFuncBench are comparing it to the libraries listed below

Sorting:

swt-user / DMPO
View on GitHub
☆52Oct 10, 2024Updated last year
sierra-research / tau-bench
View on GitHub
Code and Data for Tau-Bench
☆1,103Aug 28, 2025Updated 6 months ago
chtmp223 / suri
View on GitHub
Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)
☆27Oct 3, 2025Updated 4 months ago
jinzhuoran / RAG-RewardBench
View on GitHub
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
☆16Dec 19, 2024Updated last year
IBM / NESTFUL
View on GitHub
Companion code to https://arxiv.org/abs/2409.03797v2
☆19Sep 18, 2025Updated 5 months ago
shuzhangzhong / HybriMoE-Preview
View on GitHub
☆17Apr 9, 2025Updated 10 months ago
VITA-Group / WeLore
View on GitHub
[ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications
☆52Oct 30, 2025Updated 4 months ago
meowpass / FollowComplexInstruction
View on GitHub
Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…
☆53Jun 24, 2024Updated last year
MTU-Bench-Team / MTU-Bench
View on GitHub
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
☆58Jul 24, 2025Updated 7 months ago
SALT-NLP / demonstrated-feedback
View on GitHub
☆130Oct 1, 2024Updated last year
IBM / API-BLEND
View on GitHub
Companion code to https://arxiv.org/abs/2402.15491
☆22Sep 18, 2025Updated 5 months ago
nexusflowai / NexusBench
View on GitHub
Nexusflow function call, tool use, and agent benchmarks.
☆30Dec 13, 2024Updated last year
LG-AI-EXAONE / K-EXAONE
View on GitHub
Official repository for K-EXAONE built by LG AI Research
☆69Feb 6, 2026Updated 3 weeks ago
DaoD / SPRING
View on GitHub
[AAAI'25] SPRING: Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models
☆25Sep 24, 2025Updated 5 months ago
ignorejjj / LongRefiner
View on GitHub
The code for paper: Hierarchical Document Refinement for Long-context Retrieval-augmented Generation [ACL2025 Oral]
☆42Aug 25, 2025Updated 6 months ago
monologg / naver-nlp-challenge-2018
View on GitHub
NER task for Naver NLP Challenge 2018 (3rd Place)
☆18Mar 24, 2023Updated 2 years ago
apple / ToolSandbox
View on GitHub
☆240Nov 7, 2025Updated 3 months ago
om-ai-lab / open-agent-leaderboard
View on GitHub
Reproducible Language Agent Research
☆34Jun 25, 2025Updated 8 months ago
mira-ai-lab / DoG
View on GitHub
☆25Apr 15, 2025Updated 10 months ago
GATECH-EIC / LaCache
View on GitHub
[ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models
☆17Nov 4, 2025Updated 3 months ago
Rachum-thu / LongPiBench
View on GitHub
The repository for papaer "Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs"
☆14Dec 16, 2024Updated last year
MidiyaZhu / MePO
View on GitHub
Code for Rethinking Prompt Optimizers: From Prompt Merits to Optimization
☆12Jan 12, 2026Updated last month
hsaest / Agent-Planning-Analysis
View on GitHub
[NAACL'25] "Revealing the Barriers of Language Agents in Planning"
☆13Jun 22, 2025Updated 8 months ago
cxcscmu / deepresearch_benchmarking
View on GitHub
☆26Jul 29, 2025Updated 7 months ago
leileqiTHU / Attacker
View on GitHub
The repo for using the model https://huggingface.co/thu-coai/Attacker-v0.1
☆13Apr 23, 2025Updated 10 months ago
MDI-Benchmark / MDI-Benchmark
View on GitHub
☆14Dec 18, 2024Updated last year
jiwonsong-dev / ReasoningPathCompression
View on GitHub
[NeurIPS 2025] Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"
☆30Oct 20, 2025Updated 4 months ago
thu-coai / SPaR
View on GitHub
☆46Jun 11, 2025Updated 8 months ago
ECNU-ICALK / EduChat-Math
View on GitHub
[MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models
☆53Oct 20, 2024Updated last year
THU-KEG / Agentic-Reward-Modeling
View on GitHub
[ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
☆125Jun 11, 2025Updated 8 months ago
Trae1ounG / BuPO
View on GitHub
[arxiv: 2512.19673] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
☆59Feb 6, 2026Updated 3 weeks ago
J-Seo / KoCommonGEN-V2
View on GitHub
KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models
☆25Aug 24, 2024Updated last year
drx-code / EquivariantModeling
View on GitHub
Official PyTorch implementation of the paper "Equivariant Image Modeling"(https://arxiv.org/abs/2503.18948)
☆35Aug 1, 2025Updated 7 months ago
LG-AI-EXAONE / KoMT-Bench
View on GitHub
Official repository for KoMT-Bench built by LG AI Research
☆71Aug 8, 2024Updated last year
ali-vilab / CDT
View on GitHub
Official implementation for our paper: Rethinking Video Tokenization: A Conditioned Diffusion-based Approach
☆14Apr 2, 2025Updated 11 months ago
UKPLab / arxiv2025-inherent-limits-plms
View on GitHub
Code repository for the paper "The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Le…
☆13Jan 16, 2025Updated last year
lqzxt / NGTR
View on GitHub
☆13May 26, 2025Updated 9 months ago
D2I-ai / Route
View on GitHub
ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL (ICLR 2025 Pytorch Code)
☆17May 15, 2025Updated 9 months ago
NJUNLP / Hallu-PI
View on GitHub
The code and datasets of our ACM MM 2024 paper "Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed …
☆11Sep 27, 2024Updated last year