ServiceNow / AgentLab
☆35Updated this week
Related projects ⓘ
Alternatives and complementary repositories for AgentLab
- WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?☆123Updated last week
- BrowserGym, a gym environment for web task automation in the Chromium browser.☆316Updated this week
- TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle☆115Updated this week
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆106Updated 2 weeks ago
- ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.☆213Updated 3 weeks ago
- ☆38Updated 3 months ago
- A benchmark for evaluating learning agents based on just language feedback☆56Updated last month
- Repository for the paper Stream of Search: Learning to Search in Language☆84Updated 3 months ago
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- Code for the paper 🌳 Tree Search for Language Model Agents☆138Updated 3 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆119Updated 2 weeks ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆160Updated last month
- ☆50Updated 10 months ago
- [NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents☆273Updated 2 months ago
- Can Language Models Solve Olympiad Programming?☆100Updated 3 months ago
- Code and Data for Tau-Bench☆193Updated 2 weeks ago
- Official code for the paper "ADaPT: As-Needed Decomposition and Planning with Language Models"☆71Updated 10 months ago
- VisualWebArena is a benchmark for multimodal agents.☆235Updated last month
- ☆122Updated last week
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆194Updated 6 months ago
- A repository for transformer critique learning and generation☆85Updated 11 months ago
- DialOp: Decision-oriented dialogue environments for collaborative language agents☆98Updated 4 months ago
- An extensible benchmark for evaluating large language models on planning☆288Updated 5 months ago
- ☆73Updated 4 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆94Updated 2 weeks ago
- Official Repo for UGround☆93Updated this week
- ☆99Updated 3 months ago
- AWM: Agent Workflow Memory☆203Updated last month
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆38Updated 2 weeks ago
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆49Updated 2 months ago