princeton-nlp / WebShopLinks
[NeurIPS 2022] πWebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
β478Updated last year
Alternatives and similar repositories for WebShop
Users that are interested in WebShop are comparing it to the libraries listed below
Sorting:
- VisualWebArena is a benchmark for multimodal agents.β431Updated last year
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]β390Updated last year
- ICML 2024: Improving Factuality and Reasoning in Language Models through Multiagent Debateβ506Updated 9 months ago
- π AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resourceβ¦β371Updated 2 months ago
- FireAct: Toward Language Agent Fine-tuningβ292Updated 2 years ago
- SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasksβ324Updated last year
- An extensible benchmark for evaluating large language models on planningβ448Updated 4 months ago
- Code for the paper π³ Tree Search for Language Model Agentsβ219Updated last year
- LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.β780Updated last year
- [TMLR] Cumulative Reasoning With Large Language Models (https://arxiv.org/abs/2308.04371)β308Updated 6 months ago
- Data and Code for Program of Thoughts [TMLR 2023]β306Updated last year
- RewardBench: the first evaluation tool for reward models.β687Updated last week
- Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)β220Updated 2 years ago
- β186Updated last year
- Data and code for FreshLLMs (https://arxiv.org/abs/2310.03214)β387Updated 2 months ago
- Paper collection on building and evaluating language model agents via executable language groundingβ364Updated last year
- Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)β255Updated last year
- Codes for our paper "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate"β325Updated last year
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels β¦β285Updated 2 years ago
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Zihaβ¦β133Updated last year
- ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.β336Updated 2 months ago
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - NeurIPS 2023 (oral)β270Updated last year
- [ICML'24 Spotlight] "TravelPlanner: A Benchmark for Real-World Planning with Language Agents"β471Updated 3 months ago
- Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedbackβ207Updated 2 years ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"β202Updated 9 months ago
- Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Languβ¦β354Updated 2 years ago
- A large-scale, fine-grained, diverse preference dataset (and models).β361Updated 2 years ago
- A banchmark list for evaluation of large language models.β159Updated 3 weeks ago
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and trainingβ285Updated last year
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898β240Updated last year