THUDM / ComplexFuncBenchLinks

Complex Function Calling Benchmark.

☆117

Alternatives and similar repositories for ComplexFuncBench

Users that are interested in ComplexFuncBench are comparing it to the libraries listed below

Sorting:

facebookresearch / ReasonIR
Official repository for paper "ReasonIR Training Retrievers for Reasoning Tasks".
☆176Updated 3 weeks ago
dwzhu-pku / LongEmbed
LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)
☆138Updated 8 months ago
Ayanami0730 / deep_research_bench
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
☆190Updated 3 weeks ago
SALT-NLP / demonstrated-feedback
☆124Updated 9 months ago
Nardien / agent-distillation
Official Code Repository for the paper "Distilling LLM Agent into Small Models with Retrieval and Code Tools"
☆115Updated last month
arcee-ai / EvolKit
EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…
☆229Updated 8 months ago
TIGER-AI-Lab / CritiqueFineTuning
Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate" [COLM 2025]
☆163Updated last week
wang-research-lab / agentinstruct
Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"
☆113Updated 10 months ago
microsoft / FILM
Official repo for "Make Your LLM Fully Utilize the Context"
☆252Updated last year
xlang-ai / Spider2-V
[NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
☆127Updated 10 months ago
TIGER-AI-Lab / General-Reasoner
General Reasoner: Advancing LLM Reasoning Across All Domains
☆149Updated last month
microsoft / lost_in_conversation
Code that accompanies the public release of the paper Lost in Conversation (https://arxiv.org/abs/2505.06120)
☆141Updated 3 weeks ago
DataArcTech / LLM-as-a-Judge
☆122Updated 3 months ago
salesforce / summary-of-a-haystack
Codebase accompanying the Summary of a Haystack paper.
☆79Updated 9 months ago
QwenLM / WorldPM
☆90Updated last month
sunnynexus / RetroLLM
RetroLLM: Empowering LLMs to Retrieve Fine-grained Evidence within Generation [ACL 2025]
☆114Updated 5 months ago
voidism / Lookback-Lens
Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"
☆128Updated 11 months ago
zjunlp / OneGen
[EMNLP 2024 Findings] OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs.
☆148Updated 8 months ago
RulinShao / retrieval-scaling
Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".
☆206Updated last month
TIGER-AI-Lab / MAmmoTH2
Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]
☆145Updated 8 months ago
orionw / promptriever
The first dense retrieval model that can be prompted like an LM
☆80Updated 2 months ago
allenai / WildBench
Benchmarking LLMs with Challenging Tasks from Real Users
☆228Updated 8 months ago
allenai / olmes
Reproducible, flexible LLM evaluations
☆219Updated this week
jakespringer / echo-embeddings
☆151Updated last year
ScalerLab / JudgeBench
☆87Updated 8 months ago
sanyalsunny111 / LLM-Inheritune
This is the official repository for Inheritune.
☆111Updated 5 months ago
suzgunmirac / dynamic-cheatsheet
Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
☆66Updated last month
InternLM / SWE-Fixer
☆104Updated 2 months ago
David-Li0406 / Preference-Leakage
☆45Updated last month
GAIR-NLP / ReAlign
Reformatted Alignment
☆113Updated 9 months ago