StonyBrookNLP / appworld
🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Paper.
☆187Updated this week
Alternatives and similar repositories for appworld:
Users that are interested in appworld are comparing it to the libraries listed below
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆186Updated 3 weeks ago
- Code for the paper 🌳 Tree Search for Language Model Agents☆197Updated 9 months ago
- A banchmark list for evaluation of large language models.☆102Updated last week
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆135Updated 5 months ago
- ☆287Updated last month
- ☆163Updated last month
- A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.☆145Updated 3 weeks ago
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆123Updated 11 months ago
- "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"☆69Updated 3 weeks ago
- ☆111Updated this week
- ☆150Updated 4 months ago
- AWM: Agent Workflow Memory☆268Updated 3 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆138Updated 6 months ago
- Official Implementation of Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization☆139Updated 11 months ago
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]☆309Updated 11 months ago
- Code and example data for the paper: Rule Based Rewards for Language Model Safety☆186Updated 9 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆221Updated 6 months ago
- Building a comprehensive and handy list of papers for GUI agents☆313Updated this week
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆150Updated this week
- Towards Large Multimodal Models as Visual Foundation Agents☆209Updated last week
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆306Updated 8 months ago
- ☆109Updated 3 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆167Updated 2 weeks ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆101Updated 3 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆145Updated last month
- FireAct: Toward Language Agent Fine-tuning☆275Updated last year
- ☆121Updated 10 months ago
- Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)☆205Updated 2 years ago
- ☆227Updated 8 months ago
- VisualWebArena is a benchmark for multimodal agents.☆334Updated 5 months ago