sail-sg / oatLinks
๐พ OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
โ472Updated 2 weeks ago
Alternatives and similar repositories for oat
Users that are interested in oat are comparing it to the libraries listed below
Sorting:
- โ318Updated 4 months ago
- Reproducible, flexible LLM evaluationsโ250Updated 2 months ago
- RewardBench: the first evaluation tool for reward models.โ638Updated 3 months ago
- A simple unified framework for evaluating LLMsโ248Updated 5 months ago
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"โ261Updated 4 months ago
- โ211Updated 7 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.โ244Updated 5 months ago
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Exampleโ361Updated last week
- A project to improve skills of large language modelsโ568Updated this week
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learningโ256Updated 4 months ago
- โ207Updated 6 months ago
- โ191Updated 5 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasksโ245Updated 4 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learningโ281Updated this week
- SkyRL: A Modular Full-stack RL Library for LLMsโ906Updated this week
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsโฆโ343Updated 9 months ago
- Code for the paper: "Learning to Reason without External Rewards"โ355Updated 2 months ago
- Understanding R1-Zero-Like Training: A Critical Perspectiveโ1,100Updated last month
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.โ434Updated last year
- Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike statโฆโ263Updated last week
- A version of verl to support diverse tool useโ551Updated last week
- The HELMET Benchmarkโ172Updated last month
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"โ328Updated 10 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. ๐งฎโจโ257Updated last year
- โ948Updated 3 months ago
- Tina: Tiny Reasoning Models via LoRAโ284Updated last week
- Repo of paper "Free Process Rewards without Process Labels"โ164Updated 6 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"โ172Updated 4 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate" [COLM 2025]โ172Updated 2 months ago
- Automatic evals for LLMsโ533Updated 3 months ago