allenai / ScienceWorldLinks

ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.

☆280

Alternatives and similar repositories for ScienceWorld

Users that are interested in ScienceWorld are comparing it to the libraries listed below

Sorting:

microsoft / SmartPlay
SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …
☆140Updated last year
abdulhaim / LMRL-Gym
☆99Updated last year
CraftJarvis / MC-Planner
Implementation of "Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agen…
☆282Updated 2 years ago
karthikv792 / LLMs-Planning
An extensible benchmark for evaluating large language models on planning
☆393Updated last month
Ber666 / RAP
Reasoning with Language Model is Planning with World Model
☆168Updated last year
haotiansun14 / AdaPlanner
AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback
☆111Updated 4 months ago
flowersteam / lamorel
Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).
☆236Updated 9 months ago
SwiftSage / SwiftSage
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
☆311Updated 9 months ago
alfworld / alfworld
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
☆499Updated 2 weeks ago
YifeiZhou02 / ArCHer
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆185Updated 3 months ago
sotopia-lab / sotopia
Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)
☆233Updated last week
Berkeley-NLP / Agent-Eval-Refine
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]
☆139Updated 8 months ago
StonyBrookNLP / appworld
🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…
☆232Updated 2 months ago
McGill-NLP / VinePPO
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆167Updated 2 months ago
agentification / RAFA_code
☆143Updated last year
hkust-nlp / AgentBoard
An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]
☆335Updated last year
jlin816 / dialop
DialOp: Decision-oriented dialogue environments for collaborative language agents
☆109Updated 8 months ago
flowersteam / Grounding_LLMs_with_online_RL
We perform functional grounding of LLMs' knowledge in BabyAI-Text
☆268Updated 11 months ago
Yifan-Song793 / ETO
Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)
☆147Updated 9 months ago
princeton-nlp / intercode
[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898
☆223Updated last year
composable-models / llm_multiagent_debate
ICML 2024: Improving Factuality and Reasoning in Language Models through Multiagent Debate
☆459Updated 3 months ago
jonathanmli / Avalon-LLM
This repository contains a LLM benchmark for the social deduction game `Resistance Avalon'
☆120Updated 2 months ago
minaek / reward_design_with_llms
☆220Updated 2 years ago
DeckardAgent / deckard
Official implementation of the DECKARD Agent from the paper "Do Embodied Agents Dream of Pixelated Sheep?"
☆94Updated 2 years ago
waterhorse1 / LLM_Tree_Search
(ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training
☆278Updated last year
microsoft / LLF-Bench
A benchmark for evaluating learning agents based on just language feedback
☆86Updated last month
balrog-ai / BALROG
Benchmarking Agentic LLM and VLM Reasoning On Games
☆179Updated 2 weeks ago
princeton-nlp / WebShop
[NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
☆379Updated 11 months ago
rxlqn / awesome-llm-self-reflection
augmented LLM with self reflection
☆129Updated last year
gabegrand / world-models
☆208Updated 2 years ago