ServiceNow / AgentLab
AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and reproducibility.
☆235Updated this week
Alternatives and similar repositories for AgentLab:
Users that are interested in AgentLab are comparing it to the libraries listed below
- WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?☆160Updated this week
- 🌎💪 BrowserGym, a Gym environment for web task automation☆527Updated 2 weeks ago
- TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle☆207Updated this week
- AWM: Agent Workflow Memory☆241Updated 3 weeks ago
- Code for the paper 🌳 Tree Search for Language Model Agents☆178Updated 6 months ago
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆145Updated 2 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym☆354Updated last month
- [ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents☆168Updated this week
- VisualWebArena is a benchmark for multimodal agents.☆295Updated 3 months ago
- An agent benchmark with tasks in a simulated software company.☆243Updated this week
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆161Updated last week
- Code and Data for Tau-Bench☆272Updated 3 weeks ago
- Automatic Evals for LLMs☆266Updated this week
- OS-ATLAS: A Foundation Action Model For Generalist GUI Agents☆279Updated this week
- ☆362Updated last month
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RL☆302Updated last week
- This is a collection of resources for computer-use GUI agents, including videos, blogs, papers, and projects.☆229Updated this week
- 🦀️ CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/☆213Updated 2 months ago
- ☆349Updated 2 weeks ago
- WebLINX is a benchmark for building web navigation agents with conversational capabilities☆141Updated last week
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]☆281Updated 9 months ago
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆208Updated 9 months ago
- Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"☆462Updated 11 months ago
- A simple unified framework for evaluating LLMs☆197Updated 2 weeks ago
- ☆164Updated last month
- Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction☆221Updated last month
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"☆289Updated 3 months ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆411Updated 4 months ago
- Building a comprehensive and handy list of papers for GUI agents☆213Updated last month
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆167Updated last month