sail-sg / oat
🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.
☆224Updated last week
Alternatives and similar repositories for oat:
Users that are interested in oat are comparing it to the libraries listed below
- Reproducible, flexible LLM evaluations☆176Updated 3 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆136Updated last week
- A simple unified framework for evaluating LLMs☆204Updated 2 weeks ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆218Updated 4 months ago
- ☆143Updated 3 months ago
- ☆156Updated 2 weeks ago
- PyTorch building blocks for the OLMo ecosystem☆165Updated this week
- Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024☆122Updated 3 weeks ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆163Updated 2 weeks ago
- ☆95Updated 8 months ago
- RewardBench: the first evaluation tool for reward models.☆526Updated 3 weeks ago
- ☆135Updated 3 months ago
- A brief and partial summary of RLHF algorithms.☆124Updated 2 weeks ago
- A project to improve skills of large language models☆256Updated this week
- The HELMET Benchmark☆121Updated this week
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆130Updated 6 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆119Updated 6 months ago
- ☆102Updated 2 months ago
- Code and example data for the paper: Rule Based Rewards for Language Model Safety☆183Updated 8 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆142Updated 4 months ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆179Updated 7 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆168Updated 2 months ago
- Replicating O1 inference-time scaling laws☆83Updated 3 months ago
- ☆111Updated last month