sail-sg / oatLinks
๐พ OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
โ604Updated last week
Alternatives and similar repositories for oat
Users that are interested in oat are comparing it to the libraries listed below
Sorting:
- Reproducible, flexible LLM evaluationsโ312Updated last month
- โ329Updated 7 months ago
- A project to improve skills of large language modelsโ727Updated last week
- RewardBench: the first evaluation tool for reward models.โ674Updated 6 months ago
- A Gym for Agentic LLMsโ411Updated last week
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsโฆโ365Updated last year
- Code for the paper: "Learning to Reason without External Rewards"โ385Updated 5 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learningโ332Updated 2 months ago
- A simple unified framework for evaluating LLMsโ259Updated 8 months ago
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.โ453Updated last year
- Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike statโฆโ407Updated last month
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Exampleโ390Updated last month
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"โ573Updated 2 months ago
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"โ270Updated 2 months ago
- PyTorch building blocks for the OLMo ecosystemโ634Updated last week
- Tina: Tiny Reasoning Models via LoRAโ310Updated 3 months ago
- Minimal hackable GRPO implementationโ308Updated 11 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"โ341Updated last month
- Understanding R1-Zero-Like Training: A Critical Perspectiveโ1,180Updated 4 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasksโ254Updated 7 months ago
- Automatic evals for LLMsโ569Updated last week
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).โ335Updated 2 weeks ago
- SkyRL: A Modular Full-stack RL Library for LLMsโ1,415Updated this week
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.โ343Updated last week
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. ๐งฎโจโ271Updated last year
- The HELMET Benchmarkโ197Updated last month
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and trainingโ283Updated last year
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.โ249Updated 8 months ago
- โ1,050Updated 6 months ago
- โ218Updated 9 months ago