sail-sg / oatLinks
๐พ OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
โ627Updated last week
Alternatives and similar repositories for oat
Users that are interested in oat are comparing it to the libraries listed below
Sorting:
- Reproducible, flexible LLM evaluationsโ337Updated 2 weeks ago
- A Gym for Agentic LLMsโ444Updated 3 weeks ago
- RewardBench: the first evaluation tool for reward models.โ685Updated last week
- โ330Updated 8 months ago
- Understanding R1-Zero-Like Training: A Critical Perspectiveโ1,205Updated 5 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsโฆโ371Updated last year
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Exampleโ405Updated 2 months ago
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"โ273Updated 3 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learningโ350Updated last week
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"โ593Updated 4 months ago
- [ICLR 2026] Learning to Reason without External Rewardsโ389Updated 2 weeks ago
- A project to improve skills of large language modelsโ813Updated this week
- [ICLR 2026] Tina: Tiny Reasoning Models via LoRAโ319Updated 4 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"โ344Updated 3 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. ๐งฎโจโ273Updated last year
- A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoningโ283Updated 4 months ago
- Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike statโฆโ427Updated 2 weeks ago
- A simple unified framework for evaluating LLMsโ261Updated 9 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learningโ261Updated 8 months ago
- โ1,088Updated last month
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.โ361Updated last week
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.โ249Updated 9 months ago
- Automatic evals for LLMsโ579Updated last month
- โ224Updated 10 months ago
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.โ456Updated last year
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and trainingโ285Updated last year
- SkyRL: A Modular Full-stack RL Library for LLMsโ1,547Updated this week
- โ203Updated 9 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasksโ261Updated 9 months ago
- The HELMET Benchmarkโ198Updated 2 months ago