TextArena / TextArenaLinks
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
β339Updated this week
Alternatives and similar repositories for TextArena
Users that are interested in TextArena are comparing it to the libraries listed below
Sorting:
- πΎ OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.β609Updated this week
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"β341Updated 2 months ago
- A Gym for Agentic LLMsβ420Updated last week
- Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike statβ¦β410Updated last month
- Open source interpretability artefacts for R1.β165Updated 8 months ago
- Benchmarking Agentic LLM and VLM Reasoning On Gamesβ221Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasksβ254Updated 8 months ago
- Repository for the paper Stream of Search: Learning to Search in Languageβ152Updated 11 months ago
- Code for the paper: "Learning to Reason without External Rewards"β385Updated 6 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"β183Updated 7 months ago
- β116Updated last week
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'β234Updated 5 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.β189Updated 10 months ago
- β117Updated 11 months ago
- A simple unified framework for evaluating LLMsβ258Updated 8 months ago
- β202Updated 8 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.β174Updated 11 months ago
- Reproducible, flexible LLM evaluationsβ316Updated last month
- β123Updated 10 months ago
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and trainingβ284Updated last year
- β226Updated 10 months ago
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Exampleβ392Updated last month
- β109Updated last year
- Framework and toolkits for building and evaluating collaborative agents that can work together with humans.β117Updated last month
- Implementation of the Quiet-STAR paper (https://arxiv.org/pdf/2403.09629.pdf)β54Updated last year
- Can Language Models Solve Olympiad Programming?β124Updated 11 months ago
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"β271Updated 2 months ago
- β213Updated 2 weeks ago
- A toolkit for describing model features and intervening on those features to steer behavior.β225Updated last month
- Curated collection of community environmentsβ200Updated last week