0xallam / Direct-Preference-OptimizationLinks

Direct Preference Optimization from scratch in PyTorch

☆116

Alternatives and similar repositories for Direct-Preference-Optimization

Users that are interested in Direct-Preference-Optimization are comparing it to the libraries listed below

Sorting:

alon-albalak / data-selection-survey
A Survey on Data Selection for Language Models
☆250Updated 5 months ago
ZubinGou / math-evaluation-harness
A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨
☆259Updated last year
GAIR-NLP / LIMR
☆211Updated 8 months ago
xiaoya-li / Instruction-Tuning-Survey
Project for the paper entitled `Instruction Tuning for Large Language Models: A Survey`
☆190Updated 2 months ago
Cohere-Labs-Community / parameter-efficient-moe
☆271Updated last year
voidism / DoLa
Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"
☆522Updated 9 months ago
Vance0124 / Token-level-Direct-Preference-Optimization
Reference implementation for Token-level Direct Preference Optimization(TDPO)
☆148Updated 8 months ago
alisawuffles / proxy-tuning
Code associated with Tuning Language Models by Proxy (Liu et al., 2024)
☆121Updated last year
eddycmu / demystify-long-cot
☆323Updated 4 months ago
SuperBruceJia / Awesome-LLM-Self-Consistency
Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models
☆109Updated 3 months ago
princeton-nlp / QuRating
[ICML 2024] Selecting High-Quality Data for Training Language Models
☆192Updated last year
ezelikman / STaR
Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)
☆214Updated 2 years ago
allenai / FineGrainedRLHF
☆280Updated 9 months ago
zankner / CLoud
Critique-out-Loud Reward Models
☆70Updated last year
princeton-nlp / LESS
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
☆496Updated last year
llm-merging / LLM-Merging
LLM-Merging: Building LLMs Efficiently through Merging
☆204Updated last year
ericwtodd / function_vectors
Function Vectors in Large Language Models (ICLR 2024)
☆181Updated 6 months ago
microsoft / rho
Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.
☆439Updated last year
allenai / reward-bench
RewardBench: the first evaluation tool for reward models.
☆643Updated 4 months ago
SparkJiao / dpo-trajectory-reasoning
[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".
☆82Updated 9 months ago
OFA-Sys / gsm8k-ScRel
Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
☆266Updated last year
kanishkg / cognitive-behaviors
☆210Updated 7 months ago
yubol-bobo / Awesome-Multi-Turn-LLMs
This is the official GitHub repository for our survey paper "Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language …
☆125Updated 5 months ago
OpenMOSS / Language-Model-SAEs
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
☆157Updated this week
yihedeng9 / rlhf-summary-notes
A brief and partial summary of RLHF algorithms.
☆132Updated 7 months ago
louieworth / awesome-rlhf
An index of algorithms for reinforcement learning from human feedback (rlhf))
☆92Updated last year
getao / icae
The repo for In-context Autoencoder
☆148Updated last year
lancopku / label-words-are-anchors
Repository for Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
☆165Updated last year
QwenLM / ProcessBench
Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"
☆174Updated 5 months ago
AGI-Edgerunners / LLM-Continual-Learning-Papers
Must-read Papers on Large Language Model (LLM) Continual Learning
☆146Updated last year