princeton-nlp / WebShop
[NeurIPS 2022] πWebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
β326Updated 7 months ago
Alternatives and similar repositories for WebShop:
Users that are interested in WebShop are comparing it to the libraries listed below
- VisualWebArena is a benchmark for multimodal agents.β332Updated 5 months ago
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]β303Updated 10 months ago
- SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasksβ307Updated 5 months ago
- π Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Papβ¦β179Updated this week
- An extensible benchmark for evaluating large language models on planningβ343Updated last week
- ICML 2024: Improving Factuality and Reasoning in Language Models through Multiagent Debateβ422Updated last year
- β179Updated 2 months ago
- Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedbackβ205Updated last year
- ToolBench, an evaluation suite for LLM tool manipulation capabilities.β150Updated last year
- FireAct: Toward Language Agent Fine-tuningβ275Updated last year
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels β¦β257Updated last year
- Simple next-token-prediction for RLHFβ224Updated last year
- β172Updated last year
- Reasoning with Language Model is Planning with World Modelβ163Updated last year
- Code for the paper π³ Tree Search for Language Model Agentsβ192Updated 8 months ago
- A codebase for "Language Models can Solve Computer Tasks"β234Updated 11 months ago
- Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)β203Updated 2 years ago
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - NeurIPS 2023 (oral)β262Updated 11 months ago
- Code and implementations for the paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhiheng Xi eβ¦β446Updated last month
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898β212Updated 11 months ago
- ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.β253Updated 5 months ago
- AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and reβ¦β296Updated this week
- Paper collection on building and evaluating language model agents via executable language groundingβ352Updated 11 months ago
- Data and Code for Program of Thoughts (TMLR 2023)β267Updated 11 months ago
- Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Themβ481Updated 9 months ago
- Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"β463Updated last year
- An implemtation of Everyting of Thoughts (XoT).β141Updated last year
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedbackβ106Updated 2 weeks ago
- This is the repo for the paper Shepherd -- A Critic for Language Model Generationβ218Updated last year
- [ACL 2024] AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planningβ219Updated 3 months ago