ahmed-alllam / Direct-Preference-Optimization

Direct Preference Optimization from scratch in PyTorch

☆84

Alternatives and similar repositories for Direct-Preference-Optimization:

Users that are interested in Direct-Preference-Optimization are comparing it to the libraries listed below

ZubinGou / math-evaluation-harness
A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨
☆176Updated 10 months ago
tongyx361 / Awesome-LLM4Math
Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…
☆114Updated 7 months ago
SparkJiao / dpo-trajectory-reasoning
[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".
☆68Updated last month
vwxyzjn / summarize_from_feedback_details
☆134Updated 3 months ago
Vance0124 / Token-level-Direct-Preference-Optimization
Reference implementation for Token-level Direct Preference Optimization(TDPO)
☆128Updated 2 weeks ago
for-ai / parameter-efficient-moe
☆251Updated last year
princeton-nlp / ProLong
Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"
☆160Updated last week
tianyi-lab / Superfiltering
[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
☆143Updated 5 months ago
FreedomIntelligence / OVM
☆65Updated 11 months ago
alon-albalak / data-selection-survey
A Survey on Data Selection for Language Models
☆213Updated 4 months ago
zankner / CLoud
Critique-out-Loud Reward Models
☆53Updated 4 months ago
allenai / FineGrainedRLHF
☆269Updated last month
YuxiXie / MCTS-DPO
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
☆290Updated 6 months ago
QwenLM / ProcessBench
☆136Updated 2 months ago
WindyLee0822 / Process_Q_Model
official implementation of paper "Process Reward Model with Q-value Rankings"
☆49Updated 3 weeks ago
FranxYao / FlanT5-CoT-Specialization
Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.
☆129Updated last year
princeton-nlp / QuRating
[ICML 2024] Selecting High-Quality Data for Training Language Models
☆157Updated 8 months ago
hkust-nlp / dart-math
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
☆95Updated 2 months ago
chujiezheng / LLM-Extrapolation
Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"
☆72Updated 8 months ago
princeton-nlp / CEPE
[ACL 2024] Long-Context Language Modeling with Parallel Encodings
☆153Updated 8 months ago
PRIME-RL / ImplicitPRM
Repo of paper "Free Process Rewards without Process Labels"
☆128Updated last month
sail-sg / CPO
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
☆96Updated 4 months ago
architsharma97 / dpo-rlaif
☆95Updated 8 months ago
vwxyzjn / lm-human-preference-details
RLHF implementation details of OAI's 2019 codebase
☆181Updated last year
GAIR-NLP / ReasonEval
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆48Updated 2 months ago
ericwtodd / function_vectors
Function Vectors in Large Language Models (ICLR 2024)
☆140Updated 4 months ago
allenai / reward-bench
RewardBench: the first evaluation tool for reward models.
☆516Updated this week
allenai / olmes
Reproducible, flexible LLM evaluations
☆166Updated 2 months ago