Asap7772 / understanding-rlhfLinks

Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Our new work finds that approaches employing on-policy sampling or negative gradients outperform offline, maximum likelihood objectives.

☆32

Alternatives and similar repositories for understanding-rlhf

Users that are interested in understanding-rlhf are comparing it to the libraries listed below

Sorting:

hamishivi / EasyLM
Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…
☆76Updated last year
RLHFlow / Directional-Preference-Alignment
Directional Preference Alignment
☆58Updated last year
gregorbachmann / Next-Token-Failures
☆106Updated last year
cmu-l3 / neurips2024-inference-tutorial-code
NeurIPS 2024 tutorial on LLM Inference
☆47Updated 11 months ago
janphilippfranken / sami
Self-Supervised Alignment with Mutual Information
☆21Updated last year
abaheti95 / LoL-RL
Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients
☆26Updated last year
facebookresearch / rlfh-gen-div
This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity
☆47Updated last year
mnoukhov / async_rlhf
Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models
☆67Updated 7 months ago
sail-sg / VeriFree
Reinforcing General Reasoning without Verifiers
☆92Updated 5 months ago
haozheji / exact-optimization
ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment
☆57Updated last year
yidingjiang / ado
The repository contains code for Adaptive Data Optimization
☆28Updated 11 months ago
vwxyzjn / summarize_from_feedback_details
☆158Updated last year
martin-wey / CodeUltraFeedback
CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)
☆72Updated last year
Cornell-RL / drpo
Dateset Reset Policy Optimization
☆31Updated last year
ryoungj / BoLT
Code for "Reasoning to Learn from Latent Thoughts"
☆122Updated 8 months ago
microsoft / RLHF-APA
RL algorithm: Advantage induced policy alignment
☆66Updated 2 years ago
sail-sg / dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆44Updated 7 months ago
holarissun / RewardModelingBeyondBradleyTerry
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…
☆69Updated 8 months ago
architsharma97 / dpo-rlaif
☆100Updated last year
katiekang1998 / reasoning_generalization
☆33Updated 10 months ago
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
WentseChen / Verlog
Verlog: A Multi-turn RL framework for LLM agents
☆66Updated 2 weeks ago
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
upiterbarg / lintseq
[ICLR 2025] "Training LMs on Synthetic Edit Sequences Improves Code Synthesis" (Piterbarg, Pinto, Fergus)
☆19Updated 9 months ago
liziniu / policy_optimization
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
☆28Updated last year
hamishivi / automated-instruction-selection
Exploration of automated dataset selection approaches at large scales.
☆50Updated 9 months ago
Linear95 / DSP
Domain-specific preference (DSP) data and customized RM fine-tuning.
☆25Updated last year
ScalingIntelligence / large_language_monkeys
☆109Updated last year
hughbzhang / o1_inference_scaling_laws
Replicating O1 inference-time scaling laws
☆90Updated last year
rosieyzh / openrlhf-pretrain
Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"
☆26Updated last month