☆187Oct 29, 2025Updated 7 months ago
Alternatives and similar repositories for ACEBench
Users that are interested in ACEBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains☆1,315Jun 3, 2026Updated last week
- ☆36May 24, 2025Updated last year
- ☆513Oct 11, 2025Updated 8 months ago
- Enhanced fork of SWE-bench, tailored for OpenDevin's ecosystem.☆30May 26, 2024Updated 2 years ago
- ☆28Apr 16, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- This is the repository for the Tool Learning survey.☆484Aug 9, 2025Updated 10 months ago
- verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in…☆1,991Updated this week
- Code and data for "Improving Temporal Generalization of Pre-trained Language Models with Lexical Semantic Change" (EMNLP2022)☆18Dec 8, 2022Updated 3 years ago
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆760May 10, 2026Updated last month
- A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.☆235Apr 15, 2025Updated last year
- [ICLR 2026] Agentic Reinforced Policy Optimization (ARPO)☆1,035Apr 13, 2026Updated 2 months ago
- Code and data for "ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM" (NeurIPS 2024 Track Datasets and…☆69May 16, 2025Updated last year
- Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning☆1,470Updated this week
- Multi-agent synthetic data generation pipeline capable of generating and validating long horizon terminal/coding tasks for RL training☆64Jul 28, 2025Updated 10 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- ☆503Oct 16, 2025Updated 7 months ago
- Few-Shot Cross-Lingual Stance Detection with Sentiment-Based Pre-Training☆20Mar 4, 2022Updated 4 years ago
- DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents☆747May 11, 2026Updated last month
- UFT: Unifying Supervised and Reinforcement Fine-Tuning☆30Jun 30, 2025Updated 11 months ago
- [ICLR 2026] Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents☆93Apr 23, 2026Updated last month
- RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.☆2,699Apr 14, 2026Updated 2 months ago
- ☆65Dec 10, 2025Updated 6 months ago
- Mixture-of-Basis-Experts for Compressing MoE-based LLMs☆34Dec 24, 2025Updated 5 months ago
- This is the official repository of the paper "Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Schedulin…☆14Jul 27, 2025Updated 10 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL☆4,925Nov 13, 2025Updated 7 months ago
- This is the source code for HufuNet. Our paper is accepted by the IEEE TDSC.☆27Aug 21, 2023Updated 2 years ago
- ☆50Sep 6, 2023Updated 2 years ago
- ☆47Apr 17, 2026Updated last month
- ☆53Oct 10, 2024Updated last year
- [ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning☆13Sep 2, 2024Updated last year
- Codebase for Hyperdecoders https://arxiv.org/abs/2203.08304☆14Oct 11, 2022Updated 3 years ago
- ☆1,795Jan 20, 2026Updated 4 months ago
- ☆12Sep 14, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Verifiers for LLM Reinforcement Learning☆80Apr 15, 2025Updated last year
- [ACL-2026] MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal…☆452Apr 7, 2026Updated 2 months ago
- verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework☆21,850Updated this week
- MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning☆120Feb 2, 2026Updated 4 months ago
- [ICML 2025] Logits are All We Need to Adapt Closed Models☆23May 2, 2025Updated last year
- Self-Teaching Notes on Gradient Leakage Attacks against GPT-2 models.☆14Mar 18, 2024Updated 2 years ago
- [ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.☆5,665May 21, 2025Updated last year