THUDM / ComplexFuncBenchLinks
Complex Function Calling Benchmark.
☆109Updated 4 months ago
Alternatives and similar repositories for ComplexFuncBench
Users that are interested in ComplexFuncBench are comparing it to the libraries listed below
Sorting:
- Official repository for paper "ReasonIR Training Retrievers for Reasoning Tasks".☆162Updated last month
- ☆118Updated 9 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆223Updated 7 months ago
- Verifiers for LLM Reinforcement Learning☆55Updated last month
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆151Updated last month
- General Reasoner: Advancing LLM Reasoning Across All Domains☆117Updated this week
- Reproducible, flexible LLM evaluations☆204Updated 3 weeks ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆135Updated 6 months ago
- LOFT: A 1 Million+ Token Long-Context Benchmark☆198Updated last month
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- [NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?☆125Updated 9 months ago
- The HELMET Benchmark☆149Updated last month
- Codebase accompanying the Summary of a Haystack paper.☆78Updated 8 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆116Updated 11 months ago
- ☆80Updated 2 weeks ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆184Updated 2 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆201Updated 3 weeks ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆202Updated last year
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆142Updated 7 months ago
- ☆79Updated 4 months ago
- Reformatted Alignment☆114Updated 8 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆207Updated 3 weeks ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆90Updated 2 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆143Updated 8 months ago
- The first dense retrieval model that can be prompted like an LM☆73Updated 3 weeks ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆78Updated last year
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆220Updated 7 months ago
- Evaluating LLMs with fewer examples☆155Updated last year
- The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"☆53Updated last week
- ☆109Updated 2 months ago