princeton-nlp / WebShopLinks
[NeurIPS 2022] πWebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
β475Updated last year
Alternatives and similar repositories for WebShop
Users that are interested in WebShop are comparing it to the libraries listed below
Sorting:
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]β389Updated last year
- VisualWebArena is a benchmark for multimodal agents.β429Updated last year
- ICML 2024: Improving Factuality and Reasoning in Language Models through Multiagent Debateβ503Updated 9 months ago
- FireAct: Toward Language Agent Fine-tuningβ292Updated 2 years ago
- An extensible benchmark for evaluating large language models on planningβ445Updated 4 months ago
- π AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resourceβ¦β367Updated 2 months ago
- SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasksβ324Updated last year
- Code for the paper π³ Tree Search for Language Model Agentsβ219Updated last year
- β186Updated last year
- LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.β776Updated last year
- Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)β255Updated last year
- [ICML 2024] Official repository for "Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models"β816Updated last year
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898β238Updated last year
- Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)β220Updated 2 years ago
- Codes for our paper "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate"β323Updated last year
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - NeurIPS 2023 (oral)β268Updated last year
- Data and code for FreshLLMs (https://arxiv.org/abs/2310.03214)β385Updated 2 months ago
- papers related to LLM-agent that published on top conferencesβ321Updated 9 months ago
- Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Themβ544Updated last year
- Paper collection on building and evaluating language model agents via executable language groundingβ364Updated last year
- ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.β333Updated last month
- Data and Code for Program of Thoughts [TMLR 2023]β303Updated last year
- ALFWorld: Aligning Text and Embodied Environments for Interactive Learningβ627Updated 6 months ago
- RewardBench: the first evaluation tool for reward models.β683Updated 2 weeks ago
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Zihaβ¦β132Updated last year
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and trainingβ285Updated last year
- Reasoning with Language Model is Planning with World Modelβ185Updated 2 years ago
- ToolBench, an evaluation suite for LLM tool manipulation capabilities.β172Updated last year
- WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?β230Updated last week
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"β202Updated 9 months ago