rosieyzh / openrlhf-pretrainLinks

Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"

☆18

Alternatives and similar repositories for openrlhf-pretrain

Users that are interested in openrlhf-pretrain are comparing it to the libraries listed below

Sorting:

ryoungj / BoLT
Code for "Reasoning to Learn from Latent Thoughts"
☆112Updated 3 months ago
sail-sg / VeriFree
Reinforcing General Reasoning without Verifiers
☆71Updated 3 weeks ago
holarissun / RewardModelingBeyondBradleyTerry
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…
☆65Updated 3 months ago
sail-sg / AnytimeReasoner
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
☆39Updated last week
facebookresearch / iGSM
The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…
☆59Updated 6 months ago
alexrame / rewardedsoups
Rewarded soups official implementation
☆58Updated last year
spiral-rl / spiral
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
☆103Updated last week
facebookresearch / rlfh-gen-div
This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity
☆44Updated last year
sail-sg / dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆44Updated 3 months ago
upiterbarg / lintseq
[ICLR 2025] "Training LMs on Synthetic Edit Sequences Improves Code Synthesis" (Piterbarg, Pinto, Fergus)
☆19Updated 5 months ago
Parallel-Reasoning / APR
[COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models
☆114Updated 2 months ago
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆75Updated last month
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆123Updated 10 months ago
princeton-pli / what-makes-good-rm
What Makes a Reward Model a Good Teacher? An Optimization Perspective
☆34Updated 3 weeks ago
katiekang1998 / reasoning_generalization
☆33Updated 6 months ago
gregorbachmann / Next-Token-Failures
☆87Updated last year
Asap7772 / understanding-rlhf
Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…
☆29Updated last year
cassidylaidlaw / orpo
☆17Updated 8 months ago
haozheji / exact-optimization
ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment
☆58Updated last year
ZhentingWang / DUMP
☆20Updated 2 months ago
Jiacheng-Zhu-AIML / AsymmetryLoRA
Preprint: Asymmetry in Low-Rank Adapters of Foundation Models
☆35Updated last year
McGill-NLP / VinePPO
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆166Updated last month
liziniu / GEM
Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)
☆32Updated 2 months ago
hamishivi / automated-instruction-selection
Exploration of automated dataset selection approaches at large scales.
☆47Updated 4 months ago
matchten / LoRA-Models-for-SAEs
Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"
☆11Updated 3 months ago
MLE-Dojo / MLE-Dojo
☆54Updated 2 weeks ago
sail-sg / SkyLadder
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆33Updated 3 months ago
LLM360 / Reasoning360
A repo for open research on building large reasoning models
☆68Updated last week
LAMDASZ-ML / Self-Backtracking
☆47Updated 5 months ago
SynthLabsAI / big-math
A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
☆58Updated 4 months ago