princeton-nlp / WebShopLinks
[NeurIPS 2022] πWebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
β436Updated last year
Alternatives and similar repositories for WebShop
Users that are interested in WebShop are comparing it to the libraries listed below
Sorting:
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]β366Updated last year
- VisualWebArena is a benchmark for multimodal agents.β409Updated last year
- SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasksβ320Updated last year
- ICML 2024: Improving Factuality and Reasoning in Language Models through Multiagent Debateβ491Updated 7 months ago
- An extensible benchmark for evaluating large language models on planningβ431Updated 2 months ago
- FireAct: Toward Language Agent Fine-tuningβ286Updated 2 years ago
- π AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resourceβ¦β324Updated 2 weeks ago
- Code for the paper π³ Tree Search for Language Model Agentsβ216Updated last year
- β185Updated 10 months ago
- LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.β758Updated last year
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels β¦β282Updated 2 years ago
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898β230Updated last year
- Codes for our paper "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate"β308Updated last year
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - NeurIPS 2023 (oral)β264Updated last year
- Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Themβ530Updated last year
- [ICML 2024] Official repository for "Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models"β804Updated last year
- Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)β255Updated last year
- [ICLR 2024] Lemur: Open Foundation Models for Language Agentsβ555Updated 2 years ago
- Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)β218Updated 2 years ago
- [TMLR] Cumulative Reasoning With Large Language Models (https://arxiv.org/abs/2308.04371)β306Updated 4 months ago
- Data and Code for Program of Thoughts [TMLR 2023]β292Updated last year
- Data and code for FreshLLMs (https://arxiv.org/abs/2310.03214)β379Updated last week
- This is a collection of research papers for Self-Correcting Large Language Models with Automated Feedback.β558Updated last year
- xLAM: A Family of Large Action Models to Empower AI Agent Systemsβ584Updated 3 months ago
- Reasoning with Language Model is Planning with World Modelβ180Updated 2 years ago
- A large-scale, fine-grained, diverse preference dataset (and models).β356Updated last year
- Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedbackβ208Updated 2 years ago
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Zihaβ¦β134Updated last year
- Official Implementation of Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimizationβ181Updated last year
- RewardBench: the first evaluation tool for reward models.β660Updated 5 months ago