π AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource Paper.
β393Feb 17, 2026Updated last month
Alternatives and similar repositories for appworld
Users that are interested in appworld are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code and Data for Tau-Benchβ1,140Mar 18, 2026Updated last week
- Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhiβ¦β749Sep 11, 2025Updated 6 months ago
- β28Jan 31, 2026Updated last month
- RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.β2,565Updated this week
- A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.β222Apr 15, 2025Updated 11 months ago
- End-to-end encrypted email - Proton Mail β’ AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- β34May 24, 2025Updated 10 months ago
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]β401May 20, 2024Updated last year
- [NeurIPS 2022] πWebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agentsβ507Sep 6, 2024Updated last year
- [ICML'24] TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasksβ32Sep 20, 2024Updated last year
- This is the repository for the Tool Learning survey.β480Aug 9, 2025Updated 7 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)β161Oct 30, 2024Updated last year
- [NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agentsβ137Feb 16, 2026Updated last month
- verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-inβ¦β1,728Feb 27, 2026Updated last month
- Companion code to https://arxiv.org/abs/2402.15491β22Sep 18, 2025Updated 6 months ago
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RLβ513Jun 6, 2025Updated 9 months ago
- Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)β289Jan 23, 2026Updated 2 months ago
- Code release for "Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search" published at NeurIPS '24.β17Feb 21, 2025Updated last year
- AndroidWorld is an environment and benchmark for autonomous agentsβ679Mar 19, 2026Updated last week
- [ICLR 2025] A trinity of environments, tools, and benchmarks for general virtual agentsβ231Jun 16, 2025Updated 9 months ago
- Reasoning by Communicating with Agentsβ29Apr 29, 2025Updated 10 months ago
- WebLINX is a benchmark for building web navigation agents with conversational capabilitiesβ161Feb 11, 2025Updated last year
- Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.β392Feb 22, 2025Updated last year
- xLAM: A Family of Large Action Models to Empower AI Agent Systemsβ609Aug 21, 2025Updated 7 months ago
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environmentsβ2,714Updated this week
- ππ§ Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!β53Jul 9, 2025Updated 8 months ago
- β52Oct 10, 2024Updated last year
- AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and reβ¦β541Mar 17, 2026Updated last week
- Sotopia-Ο: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)β81May 7, 2024Updated last year
- VisualWebArena is a benchmark for multimodal agents.β450Nov 9, 2024Updated last year
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]β652Jul 29, 2025Updated 7 months ago
- Code for the paper π³ Tree Search for Language Model Agentsβ221Jul 25, 2024Updated last year
- CIKM 2022: Evaluating Interpolation and Extrapolation Performance of Neural Retrieval Modelsβ10Aug 4, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- [ICML'24 Spotlight] "TravelPlanner: A Benchmark for Real-World Planning with Language Agents"β499Nov 7, 2025Updated 4 months ago
- [NeurIPS 2024] Agent Planning with World Knowledge Modelβ166Dec 17, 2024Updated last year
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"β70Dec 9, 2024Updated last year
- β21Oct 23, 2025Updated 5 months ago
- Towards Large Multimodal Models as Visual Foundation Agentsβ259Apr 24, 2025Updated 11 months ago
- β31May 8, 2025Updated 10 months ago
- verl: Volcano Engine Reinforcement Learning for LLMsβ20,097Updated this week