sail-sg / oat
πΎ OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.
β313Updated this week
Alternatives and similar repositories for oat:
Users that are interested in oat are comparing it to the libraries listed below
- β272Updated 3 weeks ago
- Reproducible, flexible LLM evaluationsβ186Updated 2 weeks ago
- Repo of paper "Free Process Rewards without Process Labels"β140Updated 3 weeks ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.β218Updated last week
- A simple unified framework for evaluating LLMsβ210Updated last week
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learningβ175Updated 3 weeks ago
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.β405Updated 11 months ago
- RewardBench: the first evaluation tool for reward models.β547Updated last month
- A project to improve skills of large language modelsβ268Updated this week
- A brief and partial summary of RLHF algorithms.β127Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasksβ171Updated last week
- Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024β124Updated last month
- PyTorch building blocks for the OLMo ecosystemβ188Updated this week
- β604Updated last week
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsβ¦β313Updated 4 months ago
- β164Updated last month
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factualityβ181Updated 8 months ago
- The official evaluation suite and dynamic data release for MixEval.β234Updated 5 months ago
- Implementation of π₯₯ Coconut, Chain of Continuous Thought, in Pytorchβ165Updated 3 months ago
- β182Updated last month
- β147Updated 3 months ago
- [ICML 2024] CLLMs: Consistency Large Language Modelsβ390Updated 4 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"β133Updated 2 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"β150Updated 5 months ago
- The HELMET Benchmarkβ127Updated this week
- Benchmarking LLMs with Challenging Tasks from Real Usersβ221Updated 5 months ago
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"β314Updated this week
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"β234Updated last month
- Understanding R1-Zero-Like Training: A Critical Perspectiveβ818Updated last week
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"β401Updated 5 months ago