☆170Oct 29, 2025Updated 4 months ago
Alternatives and similar repositories for ACEBench
Users that are interested in ACEBench are comparing it to the libraries listed below
Sorting:
- Code and Data for Tau-Bench☆1,103Aug 28, 2025Updated 6 months ago
- Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark☆11Mar 27, 2025Updated 11 months ago
- MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models☆58Jul 24, 2025Updated 7 months ago
- ☆13Jun 16, 2021Updated 4 years ago
- ☆32May 24, 2025Updated 9 months ago
- Deep semantic role labeling using Tensorflow☆17Sep 30, 2018Updated 7 years ago
- Infrastructure for building a supervised, self-improving agent organization. Run Claude Code from Feishu & Telegram with shared memory, a…☆51Updated this week
- autonomous agent with access to a tool library☆44Feb 23, 2026Updated last week
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆705Oct 15, 2025Updated 4 months ago
- Code and data for "Improving Temporal Generalization of Pre-trained Language Models with Lexical Semantic Change" (EMNLP2022)☆18Dec 8, 2022Updated 3 years ago
- Few-Shot Cross-Lingual Stance Detection with Sentiment-Based Pre-Training☆20Mar 4, 2022Updated 3 years ago
- ☆52Oct 10, 2024Updated last year
- This is the repository for the Tool Learning survey.☆480Aug 9, 2025Updated 6 months ago
- A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.☆217Apr 15, 2025Updated 10 months ago
- Utilize the capability of GPT-4o Vision on the UHHGPT web portal☆12Aug 26, 2024Updated last year
- Code and data for "ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM" (NeurIPS 2024 Track Datasets and…☆65May 16, 2025Updated 9 months ago
- Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments☆48Jan 8, 2026Updated last month
- Verifiers for LLM Reinforcement Learning☆80Apr 15, 2025Updated 10 months ago
- ☆31Jun 12, 2024Updated last year
- ☆25Oct 27, 2020Updated 5 years ago
- DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents☆597Updated this week
- Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL☆4,085Nov 13, 2025Updated 3 months ago
- This is the source code for HufuNet. Our paper is accepted by the IEEE TDSC.☆27Aug 21, 2023Updated 2 years ago
- RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.☆2,522Updated this week
- Code for the paper "NetTaxo: Automated Topic Taxonomy Constructionfrom Text-Rich Network"☆32Feb 23, 2022Updated 4 years ago
- [ICLR 2026] Agentic Reinforced Policy Optimization (ARPO)☆892Jan 28, 2026Updated last month
- ☆1,584Jan 20, 2026Updated last month
- Supporting code for ReCEval paper☆31Sep 14, 2024Updated last year
- [ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆358Jan 12, 2026Updated last month
- OpenAI library for Crystal, providing an interface to interact with various OpenAI services.☆12Jan 17, 2024Updated 2 years ago
- An open-source session replay tool for single-page applications that uses AI analysis, aggregated trends, and a RAG chatbot to help devel…☆11Jan 23, 2026Updated last month
- Knowledge graph extraction from text using OpenAI ChatGPT for graph extraction and Neo4j for DB storage☆11Feb 26, 2024Updated 2 years ago
- ☆84Sep 11, 2024Updated last year
- MiroRL is an MCP-first reinforcement learning framework for deep research agent.☆233Aug 27, 2025Updated 6 months ago
- The repository for "MedChain: Bridging the Gap Between LLM Agents and Real-World Clinical Decision Making"☆44Oct 10, 2025Updated 4 months ago
- verl: Volcano Engine Reinforcement Learning for LLMs☆19,519Updated this week
- Simple RL training for reasoning☆3,830Dec 23, 2025Updated 2 months ago
- This is the reading list for the survey "A Survey on the Optimization of LLM-based Agents ". We will keep adding papers and improving the…☆190Jul 6, 2025Updated 7 months ago
- 是APEX贡献的一个基于大数据平台能力的数据开发平台,帮助企业以最小成本实现链接数据,构建和沉淀数仓模型,降低数据应用门槛,沉淀数据价值。☆12Oct 31, 2024Updated last year