ServiceNow / BrowserGym
BrowserGym, a gym environment for web task automation in the Chromium browser.
☆261Updated this week
Related projects: ⓘ
- WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?☆103Updated 2 months ago
- ☆222Updated last week
- VisualWebArena is a benchmark for multimodal agents.☆211Updated last month
- Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"☆681Updated last month
- Code for the paper 🌳 Tree Search for Language Model Agents☆124Updated last month
- Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"☆439Updated 6 months ago
- Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhan…☆439Updated 3 months ago
- Code for Husky, an open-source language agent that solves complex, multi-step reasoning tasks. Husky v1 addresses numerical, tabular and …☆313Updated 3 months ago
- ☆242Updated 2 weeks ago
- [ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…☆586Updated 3 weeks ago
- Agentless🐱: an agentless approach to automatically solve software development problems☆667Updated last month
- Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding☆298Updated 7 months ago
- An Analytical Evaluation Board of Multi-turn LLM Agents☆227Updated 4 months ago
- Implementation of Google's SELF-DISCOVER☆267Updated last month
- Code for Quiet-STaR☆478Updated 3 weeks ago
- [NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents☆256Updated 2 weeks ago
- Code and Data for Tau-Bench☆91Updated this week
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆182Updated 4 months ago
- AWM: Agent Workflow Memory☆121Updated this week
- CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/☆167Updated this week
- A simple unified framework for evaluating LLMs☆121Updated this week
- Benchmarks, environments, and toolkits for general computer agents☆154Updated this week
- Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"☆236Updated 6 months ago
- A codebase for "Language Models can Solve Computer Tasks"☆218Updated 4 months ago
- Build Hierarchical Autonomous Agents through Config. Collaborative Growth of Specialized Agents.☆291Updated 9 months ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆173Updated 3 weeks ago
- NexusRaven-13B, a new SOTA Open-Source LLM for function calling. This repo contains everything for reproducing our evaluation on NexusRav…☆304Updated 11 months ago
- [ICML 2024] Official repository for "Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models"☆623Updated last month
- Attribute (or cite) statements generated by LLMs back to in-context information.☆107Updated 2 weeks ago
- ☆473Updated this week