sail-sg / oatLinks

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

☆418

Alternatives and similar repositories for oat

Users that are interested in oat are comparing it to the libraries listed below

Sorting:

eddycmu / demystify-long-cot
☆306Updated 2 months ago
allenai / olmes
Reproducible, flexible LLM evaluations
☆226Updated 3 weeks ago
NovaSky-AI / SkyRL
SkyRL: A Modular Full-stack RL Library for LLMs
☆679Updated this week
knoveleng / open-rs
Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"
☆245Updated 2 months ago
allenai / reward-bench
RewardBench: the first evaluation tool for reward models.
☆619Updated last month
ypwang61 / One-Shot-RLVR
official repository for “Reinforcement Learning for Reasoning in Large Language Models with One Training Example”
☆330Updated this week
facebookresearch / sweet_rl
Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks
☆231Updated 2 months ago
WildEval / ZeroEval
A simple unified framework for evaluating LLMs
☆229Updated 3 months ago
sail-sg / oat-zero
A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.
☆245Updated 3 months ago
cmu-l3 / l1
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
☆232Updated 2 months ago
NVIDIA / NeMo-Skills
A project to improve skills of large language models
☆490Updated this week
princeton-nlp / HELMET
The HELMET Benchmark
☆161Updated 3 months ago
PRIME-RL / ImplicitPRM
Repo of paper "Free Process Rewards without Process Labels"
☆160Updated 4 months ago
microsoft / rStar
☆604Updated 2 weeks ago
da03 / Internalize_CoT_Step_by_Step
☆187Updated 3 months ago
TIGER-AI-Lab / verl-tool
A version of verl to support tool use
☆312Updated this week
sail-sg / understand-r1-zero
Understanding R1-Zero-Like Training: A Critical Perspective
☆1,048Updated last week
GAIR-NLP / LIMR
☆205Updated 5 months ago
ZubinGou / math-evaluation-harness
A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨
☆238Updated last year
sunblaze-ucb / Intuitor
Code for the paper: "Learning to Reason without External Rewards"
☆337Updated 3 weeks ago
waterhorse1 / LLM_Tree_Search
(ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training
☆278Updated last year
LeonGuertler / TextArena
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
☆217Updated last week
zwhe99 / DeepMath
A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
☆239Updated last month
shangshang-wang / Tina
Tina: Tiny Reasoning Models via LoRA
☆272Updated 2 months ago
kanishkg / cognitive-behaviors
☆203Updated 4 months ago
princeton-nlp / ProLong
Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"
☆218Updated 4 months ago
CMU-AIRe / MRT
Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".
☆101Updated 2 weeks ago
mlfoundations / evalchemy
Automatic evals for LLMs
☆488Updated last month
TIGER-AI-Lab / CritiqueFineTuning
Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate" [COLM 2025]
☆168Updated 3 weeks ago
microsoft / rho
Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.
☆428Updated last year