Complex Function Calling Benchmark.
☆165Jan 20, 2025Updated last year
Alternatives and similar repositories for ComplexFuncBench
Users that are interested in ComplexFuncBench are comparing it to the libraries listed below
Sorting:
- ☆52Oct 10, 2024Updated last year
- Code and Data for Tau-Bench☆1,103Aug 28, 2025Updated 6 months ago
- Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)☆27Oct 3, 2025Updated 4 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Dec 19, 2024Updated last year
- Companion code to https://arxiv.org/abs/2409.03797v2☆19Sep 18, 2025Updated 5 months ago
- ☆17Apr 9, 2025Updated 10 months ago
- [ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications☆52Oct 30, 2025Updated 4 months ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆53Jun 24, 2024Updated last year
- MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models☆58Jul 24, 2025Updated 7 months ago
- ☆130Oct 1, 2024Updated last year
- Companion code to https://arxiv.org/abs/2402.15491☆22Sep 18, 2025Updated 5 months ago
- Nexusflow function call, tool use, and agent benchmarks.☆30Dec 13, 2024Updated last year
- Official repository for K-EXAONE built by LG AI Research☆69Feb 6, 2026Updated 3 weeks ago
- [AAAI'25] SPRING: Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models☆25Sep 24, 2025Updated 5 months ago
- The code for paper: Hierarchical Document Refinement for Long-context Retrieval-augmented Generation [ACL2025 Oral]☆42Aug 25, 2025Updated 6 months ago
- NER task for Naver NLP Challenge 2018 (3rd Place)☆18Mar 24, 2023Updated 2 years ago
- ☆240Nov 7, 2025Updated 3 months ago
- Reproducible Language Agent Research☆34Jun 25, 2025Updated 8 months ago
- ☆25Apr 15, 2025Updated 10 months ago
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models☆17Nov 4, 2025Updated 3 months ago
- The repository for papaer "Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs"☆14Dec 16, 2024Updated last year
- Code for Rethinking Prompt Optimizers: From Prompt Merits to Optimization☆12Jan 12, 2026Updated last month
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 8 months ago
- ☆26Jul 29, 2025Updated 7 months ago
- The repo for using the model https://huggingface.co/thu-coai/Attacker-v0.1☆13Apr 23, 2025Updated 10 months ago
- ☆14Dec 18, 2024Updated last year
- [NeurIPS 2025] Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"☆30Oct 20, 2025Updated 4 months ago
- ☆46Jun 11, 2025Updated 8 months ago
- [MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models☆53Oct 20, 2024Updated last year
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆125Jun 11, 2025Updated 8 months ago
- [arxiv: 2512.19673] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies☆59Feb 6, 2026Updated 3 weeks ago
- KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models☆25Aug 24, 2024Updated last year
- Official PyTorch implementation of the paper "Equivariant Image Modeling"(https://arxiv.org/abs/2503.18948)☆35Aug 1, 2025Updated 7 months ago
- Official repository for KoMT-Bench built by LG AI Research☆71Aug 8, 2024Updated last year
- Official implementation for our paper: Rethinking Video Tokenization: A Conditioned Diffusion-based Approach☆14Apr 2, 2025Updated 11 months ago
- Code repository for the paper "The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Le…☆13Jan 16, 2025Updated last year
- ☆13May 26, 2025Updated 9 months ago
- ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL (ICLR 2025 Pytorch Code)☆17May 15, 2025Updated 9 months ago
- The code and datasets of our ACM MM 2024 paper "Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed …☆11Sep 27, 2024Updated last year