sail-sg / oatLinks
๐พ OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
โ573Updated last month
Alternatives and similar repositories for oat
Users that are interested in oat are comparing it to the libraries listed below
Sorting:
- โ327Updated 6 months ago
- A project to improve skills of large language modelsโ619Updated last week
- Reproducible, flexible LLM evaluationsโ286Updated last week
- RewardBench: the first evaluation tool for reward models.โ660Updated 5 months ago
- A Gym for Agentic LLMsโ364Updated 3 weeks ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learningโ316Updated last month
- A simple unified framework for evaluating LLMsโ254Updated 7 months ago
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"โ268Updated last month
- Code for the paper: "Learning to Reason without External Rewards"โ380Updated 4 months ago
- Understanding R1-Zero-Like Training: A Critical Perspectiveโ1,157Updated 3 months ago
- Tina: Tiny Reasoning Models via LoRAโ308Updated 2 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsโฆโ358Updated 11 months ago
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Exampleโ381Updated last week
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.โ447Updated last year
- โ1,010Updated 4 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. ๐งฎโจโ267Updated last year
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasksโ252Updated 6 months ago
- Automatic evals for LLMsโ558Updated 5 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.โ248Updated 7 months ago
- Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike statโฆโ369Updated 2 weeks ago
- โ216Updated 8 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"โ181Updated 6 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learningโ257Updated 6 months ago
- SkyRL: A Modular Full-stack RL Library for LLMsโ1,287Updated this week
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"โ340Updated 3 weeks ago
- PyTorch building blocks for the OLMo ecosystemโ400Updated last week
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.โ316Updated this week
- โ199Updated 7 months ago
- A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoningโ277Updated 2 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]โ579Updated 4 months ago