SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. SmartPlay is designed to be easy to use, and to support future development of LLMs.
☆146Apr 11, 2024Updated last year
Alternatives and similar repositories for SmartPlay
Users that are interested in SmartPlay are comparing it to the libraries listed below
Sorting:
- ☆15Mar 26, 2024Updated last year
- ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.☆337Dec 3, 2025Updated 2 months ago
- Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).☆244Dec 11, 2025Updated 2 months ago
- [EMNLP 2024 Findings] Benchmarking Language Model Agents for Data-Driven Science☆34Oct 25, 2024Updated last year
- ☆330Jun 19, 2024Updated last year
- Experiments for performing empirical game-theoretic analysis of networked system control for common-pool resource management using multi-…☆18Oct 11, 2020Updated 5 years ago
- Verlog: A Multi-turn RL framework for LLM agents☆68Updated this week
- This is the repository for paper EscapeBench: Pushing Language Models to Think Outside the Box☆18Dec 19, 2024Updated last year
- ☆19Oct 27, 2025Updated 4 months ago
- ☆19Jul 24, 2025Updated 7 months ago
- ☆89Aug 21, 2023Updated 2 years ago
- Measuring General Intelligence With Generated Games (Preprint)☆25Jul 30, 2025Updated 7 months ago
- ALFWorld: Aligning Text and Embodied Environments for Interactive Learning☆648Feb 8, 2026Updated 3 weeks ago
- A2C for GVG-AI☆23Nov 7, 2018Updated 7 years ago
- Directed masked autoencoders☆14Feb 20, 2026Updated last week
- ☆29Mar 22, 2024Updated last year
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆3,187Feb 8, 2026Updated 3 weeks ago
- Standalone library of frequently-used wrappers for dm_env environments.☆18Jul 9, 2024Updated last year
- Radiantloom Email Assist 7B is an email-assistant large language model fine-tuned from Zephyr-7B-Beta, over a custom-curated dataset of 1…☆14Jan 19, 2024Updated 2 years ago
- ☆11Jun 21, 2025Updated 8 months ago
- Code and data for the paper "Can Large Language Models Understand Real-World Complex Instructions?"(AAAI2024)☆50Apr 19, 2024Updated last year
- Official repository for the paper "Automating Continual Learning"☆18Jun 11, 2025Updated 8 months ago
- Code for a model-based version of Constrained Policy Optimization☆11May 6, 2021Updated 4 years ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated last year
- Neural network backend for training and inference for animal pose estimation.☆15Feb 20, 2026Updated last week
- ☆11Apr 21, 2025Updated 10 months ago
- ☆14May 9, 2024Updated last year
- ☆12Feb 16, 2024Updated 2 years ago
- ☆12Aug 30, 2021Updated 4 years ago
- We perform functional grounding of LLMs' knowledge in BabyAI-Text☆276Oct 27, 2025Updated 4 months ago
- Data and codes for EMNLP 2022 paper "CDConv: A Benchmark for Contradiction Detection in Chinese Conversations"☆13May 8, 2023Updated 2 years ago
- Docker containers of baseline agents for the Crafter environment☆30Dec 14, 2021Updated 4 years ago
- This code accompanies the paper "Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration."☆37Jul 11, 2025Updated 7 months ago
- A simple and efficient llama3 local service deployment solution that supports real-time streaming response and is optimized for common Ch…☆13Jul 31, 2024Updated last year
- ☆15May 11, 2023Updated 2 years ago
- AI Developer Plugin for Eclipse☆13May 17, 2024Updated last year
- Benchmarking RL generalization in an interpretable way.☆175Nov 20, 2025Updated 3 months ago
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆241May 5, 2024Updated last year
- A beginner-friendly repository on Deep Reinforcement Learning (RL), written in PyTorch.☆26Jan 27, 2026Updated last month