sail-sg / oatLinks
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
☆404Updated this week
Alternatives and similar repositories for oat
Users that are interested in oat are comparing it to the libraries listed below
Sorting:
- ☆303Updated last month
- Reproducible, flexible LLM evaluations☆215Updated 2 months ago
- SkyRL: A Modular Full-stack RL Library for LLMs☆574Updated this week
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆244Updated 2 months ago
- RewardBench: the first evaluation tool for reward models.☆609Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆223Updated 2 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆244Updated 2 months ago
- ☆182Updated 2 months ago
- A simple unified framework for evaluating LLMs☆221Updated 2 months ago
- A project to improve skills of large language models☆456Updated this week
- official repository for “Reinforcement Learning for Reasoning in Large Language Models with One Training Example”☆315Updated last week
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆207Updated this week
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆341Updated 7 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆226Updated last month
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.☆428Updated last year
- The HELMET Benchmark☆155Updated 2 months ago
- Understanding R1-Zero-Like Training: A Critical Perspective☆1,023Updated last week
- Code for the paper: "Learning to Reason without External Rewards"☆317Updated this week
- PyTorch building blocks for the OLMo ecosystem☆258Updated this week
- Repo of paper "Free Process Rewards without Process Labels"☆154Updated 3 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆258Updated 3 weeks ago
- ☆199Updated 3 months ago
- ☆205Updated 4 months ago
- Automatic evals for LLMs☆461Updated 2 weeks ago
- Tina: Tiny Reasoning Models via LoRA☆266Updated last month
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training☆278Updated last year
- ☆585Updated 2 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆227Updated last year
- Code and example data for the paper: Rule Based Rewards for Language Model Safety☆188Updated 11 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆220Updated 7 months ago