microsoft / SmartPlayLinks

SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. SmartPlay is designed to be easy to use, and to support future development of LLMs.

☆140

Alternatives and similar repositories for SmartPlay

Users that are interested in SmartPlay are comparing it to the libraries listed below

Sorting:

abdulhaim / LMRL-Gym
☆99Updated last year
CraftJarvis / MC-Planner
Implementation of "Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agen…
☆282Updated 2 years ago
allenai / ScienceWorld
ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.
☆280Updated 3 weeks ago
DeckardAgent / deckard
Official implementation of the DECKARD Agent from the paper "Do Embodied Agents Dream of Pixelated Sheep?"
☆94Updated 2 years ago
flowersteam / lamorel
Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).
☆236Updated 9 months ago
haotiansun14 / AdaPlanner
AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback
☆111Updated 4 months ago
flowersteam / Grounding_LLMs_with_online_RL
We perform functional grounding of LLMs' knowledge in BabyAI-Text
☆268Updated 11 months ago
mindagent / mindagent
☆92Updated last year
agentification / RAFA_code
☆143Updated last year
BladeTransformerLLC / OvercookedGPT
An OpenAI gym environment to evaluate the ability of LLMs (eg. GPT-4, Claude) in long-horizon reasoning and task planning in dynamic mult…
☆69Updated 2 years ago
Ber666 / RAP
Reasoning with Language Model is Planning with World Model
☆168Updated last year
YifeiZhou02 / ArCHer
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆184Updated 3 months ago
minaek / reward_design_with_llms
☆220Updated 2 years ago
jhejna / cpl
Code for Contrastive Preference Learning (CPL)
☆173Updated 8 months ago
microsoft / LLF-Bench
A benchmark for evaluating learning agents based on just language feedback
☆84Updated last month
jlin816 / dynalang
Code for "Learning to Model the World with Language." ICML 2024 Oral.
☆387Updated last year
Berkeley-NLP / Agent-Eval-Refine
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]
☆139Updated 8 months ago
DigiRL-agent / digiq
☆109Updated 3 months ago
amazon-science / PAE
☆60Updated 4 months ago
UMass-Embodied-AGI / CoELA
[ICLR 2024] Source codes for the paper "Building Cooperative Embodied Agents Modularly with Large Language Models"
☆265Updated 4 months ago
McGill-NLP / VinePPO
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆167Updated 2 months ago
karthikv792 / LLMs-Planning
An extensible benchmark for evaluating large language models on planning
☆393Updated last month
PKU-RL / Creative-Agents
☆44Updated last year
bigai-nlco / langsuite
Official Repo of LangSuitE
☆84Updated 11 months ago
jwhj / OREO
☆114Updated 6 months ago
waterhorse1 / Natural-language-RL
Natural Language Reinforcement Learning
☆92Updated last week
balrog-ai / BALROG
Benchmarking Agentic LLM and VLM Reasoning On Games
☆178Updated 2 weeks ago
amazon-science / alexa-arena
☆109Updated last month
CraftJarvis / MC-Controller
Implementation of "Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction"
☆46Updated last year
szxiangjn / world-model-for-language-model
☆131Updated last year