princeton-nlp / WebShopLinks
[NeurIPS 2022] πWebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
β394Updated last year
Alternatives and similar repositories for WebShop
Users that are interested in WebShop are comparing it to the libraries listed below
Sorting:
- SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasksβ314Updated 10 months ago
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]β346Updated last year
- VisualWebArena is a benchmark for multimodal agents.β372Updated 10 months ago
- ICML 2024: Improving Factuality and Reasoning in Language Models through Multiagent Debateβ463Updated 4 months ago
- π Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Papβ¦β245Updated last month
- FireAct: Toward Language Agent Fine-tuningβ282Updated last year
- An extensible benchmark for evaluating large language models on planningβ402Updated 2 months ago
- LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.β734Updated 11 months ago
- β183Updated 7 months ago
- Code for the paper π³ Tree Search for Language Model Agentsβ213Updated last year
- RewardBench: the first evaluation tool for reward models.β630Updated 3 months ago
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898β223Updated last year
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - NeurIPS 2023 (oral)β263Updated last year
- Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Languβ¦β353Updated 2 years ago
- Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)β247Updated last year
- Data and code for FreshLLMs (https://arxiv.org/abs/2310.03214)β371Updated last week
- Official implementation of TMLR paper "Cumulative Reasoning With Large Language Models" (https://arxiv.org/abs/2308.04371)β301Updated last month
- ALFWorld: Aligning Text and Embodied Environments for Interactive Learningβ521Updated last month
- Data and Code for Program of Thoughts [TMLR 2023]β285Updated last year
- Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Themβ511Updated last year
- Codes for our paper "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate"β296Updated 10 months ago
- Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)β209Updated 2 years ago
- AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and reβ¦β399Updated last week
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and trainingβ279Updated last year
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels β¦β277Updated 2 years ago
- ToolBench, an evaluation suite for LLM tool manipulation capabilities.β160Updated last year
- [NeurIPS 2024] Agent Planning with World Knowledge Modelβ148Updated 8 months ago
- Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedbackβ207Updated 2 years ago
- ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.β289Updated last month
- [ICML 2024] Official repository for "Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models"β784Updated last year