princeton-nlp / WebShopLinks
[NeurIPS 2022] πWebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
β454Updated last year
Alternatives and similar repositories for WebShop
Users that are interested in WebShop are comparing it to the libraries listed below
Sorting:
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]β384Updated last year
- An extensible benchmark for evaluating large language models on planningβ438Updated 3 months ago
- VisualWebArena is a benchmark for multimodal agents.β423Updated last year
- SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasksβ323Updated last year
- ICML 2024: Improving Factuality and Reasoning in Language Models through Multiagent Debateβ503Updated 8 months ago
- Code for the paper π³ Tree Search for Language Model Agentsβ218Updated last year
- FireAct: Toward Language Agent Fine-tuningβ289Updated 2 years ago
- β186Updated 11 months ago
- LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.β768Updated last year
- π AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resourceβ¦β354Updated last month
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898β233Updated last year
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - NeurIPS 2023 (oral)β267Updated last year
- Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Themβ540Updated last year
- Paper collection on building and evaluating language model agents via executable language groundingβ363Updated last year
- RewardBench: the first evaluation tool for reward models.β675Updated 7 months ago
- Reasoning with Language Model is Planning with World Modelβ185Updated 2 years ago
- Data and Code for Program of Thoughts [TMLR 2023]β302Updated last year
- [TMLR] Cumulative Reasoning With Large Language Models (https://arxiv.org/abs/2308.04371)β308Updated 5 months ago
- Data and code for FreshLLMs (https://arxiv.org/abs/2310.03214)β383Updated last month
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Zihaβ¦β134Updated last year
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels β¦β284Updated 2 years ago
- Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)β218Updated 2 years ago
- ToolBench, an evaluation suite for LLM tool manipulation capabilities.β168Updated last year
- Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Languβ¦β353Updated 2 years ago
- Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)β254Updated last year
- ALFWorld: Aligning Text and Embodied Environments for Interactive Learningβ614Updated 5 months ago
- A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.β384Updated 2 years ago
- Codes for our paper "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate"β321Updated last year
- An implemtation of Everyting of Thoughts (XoT).β155Updated last year
- [ICML 2024] Official repository for "Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models"β813Updated last year