LeonGuertler / TextArenaLinks

A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning

☆330

Alternatives and similar repositories for TextArena

Users that are interested in TextArena are comparing it to the libraries listed below

Sorting:

axon-rl / gem
A Gym for Agentic LLMs
☆409Updated this week
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆165Updated 8 months ago
balrog-ai / BALROG
Benchmarking Agentic LLM and VLM Reasoning On Games
☆217Updated 3 weeks ago
ekinakyurek / marc
Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"
☆340Updated last month
sail-sg / oat
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
☆582Updated last month
facebookresearch / meta-agents-research-environments
Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike stat…
☆405Updated last month
facebookresearch / sweet_rl
Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks
☆254Updated 7 months ago
McGill-NLP / VinePPO
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆183Updated 6 months ago
allenai / olmes
Reproducible, flexible LLM evaluations
☆305Updated last month
sunblaze-ucb / Intuitor
Code for the paper: "Learning to Reason without External Rewards"
☆383Updated 5 months ago
OSU-NLP-Group / GrokkedTransformer
Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'
☆234Updated 5 months ago
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆152Updated 10 months ago
LeonGuertler / UnstableBaselines
☆115Updated 2 weeks ago
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆190Updated 9 months ago
da03 / Internalize_CoT_Step_by_Step
☆200Updated 8 months ago
WildEval / ZeroEval
A simple unified framework for evaluating LLMs
☆257Updated 8 months ago
SalesforceAIResearch / LaTRO
☆125Updated 10 months ago
TransluceAI / observatory
A toolkit for describing model features and intervening on those features to steer behavior.
☆223Updated last week
waterhorse1 / LLM_Tree_Search
(ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training
☆283Updated last year
princeton-nlp / USACO
Can Language Models Solve Olympiad Programming?
☆123Updated 11 months ago
tokenbender / avataRL
rl from zero pretrain, can it be done? yes.
☆282Updated 2 months ago
spiral-rl / spiral
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
☆171Updated 3 months ago
jwhj / OREO
☆116Updated 11 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆174Updated 11 months ago
knoveleng / open-rs
Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"
☆270Updated 2 months ago
METR / RE-Bench
☆124Updated 2 months ago
ucl-dark / llm_debate
Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"
☆123Updated last year
PrimeIntellect-ai / prime-environments
Curated collection of community environments
☆195Updated last week
StonyBrookNLP / appworld
🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource…
☆346Updated last month
ypwang61 / One-Shot-RLVR
[NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example
☆385Updated last month