sail-sg / oat
๐พ OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.
โ336Updated 2 weeks ago
Alternatives and similar repositories for oat:
Users that are interested in oat are comparing it to the libraries listed below
- โ287Updated last month
- โ192Updated 2 months ago
- RewardBench: the first evaluation tool for reward models.โ562Updated 2 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.โ234Updated 3 weeks ago
- A project to improve skills of large language modelsโ354Updated this week
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learningโ195Updated last month
- Reproducible, flexible LLM evaluationsโ197Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasksโ186Updated 3 weeks ago
- Repo of paper "Free Process Rewards without Process Labels"โ145Updated last month
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsโฆโ322Updated 4 months ago
- โ671Updated last week
- Benchmarking LLMs with Challenging Tasks from Real Usersโ221Updated 6 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"โ141Updated 2 weeks ago
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.โ409Updated last year
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"โ177Updated last month
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).โ222Updated last month
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoningโ175Updated last month
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. ๐งฎโจ