sail-sg / oatLinks
๐พ OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
โ558Updated last week
Alternatives and similar repositories for oat
Users that are interested in oat are comparing it to the libraries listed below
Sorting:
- RewardBench: the first evaluation tool for reward models.โ649Updated 5 months ago
- Reproducible, flexible LLM evaluationsโ264Updated 2 weeks ago
- A project to improve skills of large language modelsโ608Updated this week
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"โ269Updated 3 weeks ago
- โ326Updated 5 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learningโ304Updated 2 weeks ago
- Understanding R1-Zero-Like Training: A Critical Perspectiveโ1,148Updated 2 months ago
- Code for the paper: "Learning to Reason without External Rewards"โ370Updated 4 months ago
- A Gym for Agentic LLMsโ352Updated this week
- Tina: Tiny Reasoning Models via LoRAโ304Updated last month
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Exampleโ372Updated 3 weeks ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasksโ249Updated 6 months ago
- Automatic evals for LLMsโ556Updated 4 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.โ248Updated 6 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsโฆโ353Updated 11 months ago
- โ995Updated 4 months ago
- [NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewardsโ1,214Updated last month
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.โ443Updated last year
- A simple unified framework for evaluating LLMsโ254Updated 6 months ago
- โ197Updated 6 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"โ336Updated 11 months ago
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"โ551Updated last month
- SkyRL: A Modular Full-stack RL Library for LLMsโ1,170Updated this week
- Minimal hackable GRPO implementationโ300Updated 9 months ago
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.โ144Updated 9 months ago
- โ215Updated 7 months ago
- Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike statโฆโ348Updated this week
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and trainingโ284Updated last year
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. ๐งฎโจโ261Updated last year
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]โ568Updated 3 months ago