okarthikb / DPOLinks

Implementation of Direct Preference Optimization

☆16

Alternatives and similar repositories for DPO

Users that are interested in DPO are comparing it to the libraries listed below

Sorting:

likenneth / q_probe
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
☆41Updated last year
Asap7772 / understanding-rlhf
Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…
☆32Updated last year
yidingjiang / ado
The repository contains code for Adaptive Data Optimization
☆26Updated 10 months ago
mlfoundations / scaling
Language models scale reliably with over-training and on downstream tasks
☆100Updated last year
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
katiekang1998 / reasoning_generalization
☆33Updated 9 months ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
hamishivi / EasyLM
Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…
☆75Updated last year
RobertCsordas / moeut
☆86Updated last year
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
adamkarvonen / SAE_BoardGameEval
☆23Updated 9 months ago
mnoukhov / async_rlhf
Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models
☆64Updated 6 months ago
berlino / seq_icl
☆53Updated last year
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆84Updated last year
abhishekpanigrahi1996 / transformer_in_transformer
☆45Updated 2 years ago
gregorbachmann / Next-Token-Failures
☆103Updated last year
allenai / easy-to-hard-generalization
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Updated last year
hughbzhang / o1_inference_scaling_laws
Replicating O1 inference-time scaling laws
☆90Updated 10 months ago
mcleish7 / gemstone-scaling-laws
Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)
☆29Updated last month
cmu-l3 / neurips2024-inference-tutorial-code
NeurIPS 2024 tutorial on LLM Inference
☆47Updated 10 months ago
sustcsonglin / mamba-triton
☆48Updated last year
young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆35Updated last month
casmlab / NPHardEval
Repository for NPHardEval, a quantified-dynamic benchmark of LLMs
☆59Updated last year
architsharma97 / dpo-rlaif
☆100Updated last year
KaiNylund / lm-weights-encode-time
☆69Updated last year
microsoft / RLHF-APA
RL algorithm: Advantage induced policy alignment
☆65Updated 2 years ago
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆78Updated last year
scottlogic-alex / prm800k-denorm
Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format
☆27Updated 2 years ago
taufeeque9 / codebook-features
Sparse and discrete interpretability tool for neural networks
☆64Updated last year
DeqingFu / transformers-icl-second-order
Official repository for our paper, Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Mode…
☆19Updated 11 months ago