ServiceNow / WorkArena
WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
☆103Updated 2 months ago
Related projects: ⓘ
- BrowserGym, a gym environment for web task automation in the Chromium browser.☆261Updated this week
- ☆15Updated this week
- ☆34Updated last month
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆81Updated last month
- Code and Data for Tau-Bench☆91Updated this week
- DialOp: Decision-oriented dialogue environments for collaborative language agents☆97Updated 2 months ago
- A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use☆106Updated 5 months ago
- Code for the paper 🌳 Tree Search for Language Model Agents☆124Updated last month
- Attribute (or cite) statements generated by LLMs back to in-context information.☆107Updated 2 weeks ago
- A repository for transformer critique learning and generation☆84Updated 9 months ago
- VisualWebArena is a benchmark for multimodal agents.☆211Updated last month
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆182Updated 4 months ago
- Repository for paper Tools Are Instrumental for Language Agents in Complex Environments☆32Updated 8 months ago
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".☆151Updated 4 months ago
- A set of utilities for running few-shot prompting experiments on large-language models☆106Updated 10 months ago
- ☆87Updated 2 months ago
- ☆120Updated 2 months ago
- ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.☆202Updated 2 months ago
- ☆49Updated 8 months ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆56Updated 3 months ago
- Codebase accompanying the Summary of a Haystack paper.☆65Updated 2 months ago
- [NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents☆256Updated 2 weeks ago
- Self-Alignment with Principle-Following Reward Models☆144Updated 6 months ago
- Functional Benchmarks and the Reasoning Gap☆74Updated last month
- Can Language Models Solve Olympiad Programming?☆92Updated last month
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆48Updated 3 weeks ago
- A benchmark that challenges language models to code solutions for scientific problems☆69Updated this week
- ☆158Updated last year
- PASTA: Post-hoc Attention Steering for LLMs☆96Updated last week
- ☆105Updated this week