princeton-nlp / WebShopLinks
[NeurIPS 2022] πWebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
β411Updated last year
Alternatives and similar repositories for WebShop
Users that are interested in WebShop are comparing it to the libraries listed below
Sorting:
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]β355Updated last year
- VisualWebArena is a benchmark for multimodal agents.β392Updated 11 months ago
- SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasksβ315Updated last year
- An extensible benchmark for evaluating large language models on planningβ419Updated last month
- ICML 2024: Improving Factuality and Reasoning in Language Models through Multiagent Debateβ476Updated 6 months ago
- π AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resourceβ¦β290Updated this week
- FireAct: Toward Language Agent Fine-tuningβ282Updated 2 years ago
- Code for the paper π³ Tree Search for Language Model Agentsβ217Updated last year
- β186Updated 8 months ago
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898β227Updated last year
- Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)β252Updated last year
- [ICML 2024] Official repository for "Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models"β796Updated last year
- Data and Code for Program of Thoughts [TMLR 2023]β289Updated last year
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - NeurIPS 2023 (oral)β262Updated last year
- Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)β214Updated 2 years ago
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and trainingβ283Updated last year
- Data and code for FreshLLMs (https://arxiv.org/abs/2310.03214)β377Updated this week
- Codes for our paper "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate"β303Updated last year
- LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.β750Updated last year
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels β¦β279Updated 2 years ago
- ALFWorld: Aligning Text and Embodied Environments for Interactive Learningβ548Updated 3 months ago
- Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Themβ519Updated last year
- [TMLR] Cumulative Reasoning With Large Language Models (https://arxiv.org/abs/2308.04371)β302Updated 2 months ago
- RewardBench: the first evaluation tool for reward models.β643Updated 4 months ago
- Reasoning with Language Model is Planning with World Modelβ175Updated 2 years ago
- Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"β470Updated last year
- WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?β214Updated 2 weeks ago
- A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.β380Updated 2 years ago
- Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Languβ¦β351Updated 2 years ago
- Build Hierarchical Autonomous Agents through Config. Collaborative Growth of Specialized Agents.β323Updated last year