☆176Oct 29, 2025Updated 6 months ago
Alternatives and similar repositories for ACEBench
Users that are interested in ACEBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains☆1,185May 15, 2026Updated last week
- Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark☆12Mar 27, 2025Updated last year
- ☆35May 24, 2025Updated last year
- MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models☆60Jul 24, 2025Updated 10 months ago
- ☆513Oct 11, 2025Updated 7 months ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- ☆28Apr 16, 2024Updated 2 years ago
- This is the repository for the Tool Learning survey.☆483Aug 9, 2025Updated 9 months ago
- verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in…☆1,909Feb 27, 2026Updated 2 months ago
- Code and data for "Improving Temporal Generalization of Pre-trained Language Models with Lexical Semantic Change" (EMNLP2022)☆18Dec 8, 2022Updated 3 years ago
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆754May 10, 2026Updated 2 weeks ago
- A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.☆234Apr 15, 2025Updated last year
- Data mapping framework for rust stuff☆53Mar 25, 2026Updated 2 months ago
- [ACL 2025 (Findings)] DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling☆22Dec 16, 2024Updated last year
- [ICLR 2026] Agentic Reinforced Policy Optimization (ARPO)☆1,010Apr 13, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Code and data for "ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM" (NeurIPS 2024 Track Datasets and…☆68May 16, 2025Updated last year
- [ICLR 2026] Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents☆77Apr 23, 2026Updated last month
- ☆492Oct 16, 2025Updated 7 months ago
- Few-Shot Cross-Lingual Stance Detection with Sentiment-Based Pre-Training☆20Mar 4, 2022Updated 4 years ago
- DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents☆730May 11, 2026Updated 2 weeks ago
- UFT: Unifying Supervised and Reinforcement Fine-Tuning☆30Jun 30, 2025Updated 10 months ago
- RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.☆2,668Apr 14, 2026Updated last month
- ☆64Dec 10, 2025Updated 5 months ago
- Mixture-of-Basis-Experts for Compressing MoE-based LLMs☆34Dec 24, 2025Updated 5 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- create timer videos at any speed.☆15Sep 25, 2023Updated 2 years ago
- Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL☆4,753Nov 13, 2025Updated 6 months ago
- ☆50Sep 6, 2023Updated 2 years ago
- ☆19Jun 25, 2024Updated last year
- Supporting code for ReCEval paper☆32Sep 14, 2024Updated last year
- ☆46Apr 17, 2026Updated last month
- ☆53Oct 10, 2024Updated last year
- ☆1,774Jan 20, 2026Updated 4 months ago
- ☆248Nov 7, 2025Updated 6 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Multi-turn RL framework for aligning models to be tutors instead of answerers. EMNLP 2025 Oral☆38Dec 11, 2025Updated 5 months ago
- Verifiers for LLM Reinforcement Learning☆80Apr 15, 2025Updated last year
- verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework☆21,514Updated this week
- [ACL-2026] MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal…☆446Apr 7, 2026Updated last month
- ☆16Jun 1, 2023Updated 2 years ago
- [ICML 2025] Logits are All We Need to Adapt Closed Models☆23May 2, 2025Updated last year
- [EMNLP 2025 Findings] Familiarity-aware Evidence Compression for Retrieval Augmented Generation☆15Aug 20, 2025Updated 9 months ago