rsanchezmo / gym-llmLinks
Testing LLMs reflection and planning capabilities with gym environments
☆12Updated last year
Alternatives and similar repositories for gym-llm
Users that are interested in gym-llm are comparing it to the libraries listed below
Sorting:
- Benchmarking Agentic LLM and VLM Reasoning On Games☆221Updated last month
- ☆104Updated 7 months ago
- A benchmark for evaluating learning agents based on just language feedback☆93Updated 7 months ago
- WONDERBREAD benchmark + dataset for BPM tasks☆32Updated 5 months ago
- A virtual environment for developing and evaluating automated scientific discovery agents.☆198Updated 10 months ago
- ☆117Updated 11 months ago
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆124Updated 9 months ago
- Official Code Release for "Training a Generally Curious Agent"☆43Updated 7 months ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆113Updated 5 months ago
- Code for the paper 🌳 Tree Search for Language Model Agents☆218Updated last year
- ☆133Updated last month
- The source code of the paper "Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Pla…☆107Updated last year
- Reasoning with Language Model is Planning with World Model☆185Updated 2 years ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆147Updated last year
- Dataset and benchmark for assessing LLMs in translating natural language descriptions of planning problems into PDDL☆63Updated last year
- ☆29Updated 10 months ago
- ☆41Updated last year
- ☆109Updated last year
- SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …☆144Updated last year
- Automated Capability Discovery via Foundation Model Self-Exploration☆66Updated 10 months ago
- ☆144Updated 5 months ago
- [NeurIPS 2024] Agent Planning with World Knowledge Model☆161Updated last year
- Official Implementation of "DeLLMa: Decision Making Under Uncertainty with Large Language Models"☆69Updated last year
- ☆144Updated last year
- ☆226Updated 10 months ago
- Can Language Models Solve Olympiad Programming?☆124Updated 11 months ago
- Governance of the Commons Simulation (GovSim)☆64Updated 11 months ago
- This repository contains a LLM benchmark for the social deduction game `Resistance Avalon'☆136Updated 7 months ago
- Learn online intrinsic rewards from LLM feedback☆45Updated last year
- Official codebase for "Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions" (Matrenok …☆29Updated last month