0xallam / Direct-Preference-Optimization
Direct Preference Optimization from scratch in PyTorch
ā90Updated 2 weeks ago
Alternatives and similar repositories for Direct-Preference-Optimization:
Users that are interested in Direct-Preference-Optimization are comparing it to the libraries listed below
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. š§®āØā203Updated 11 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)ā108Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)ā136Updated 2 months ago
- A Survey on Data Selection for Language Modelsā225Updated 6 months ago
- ā137Updated 5 months ago
- A brief and partial summary of RLHF algorithms.ā127Updated last month
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".ā76Updated 3 months ago
- ā65Updated last year
- Repo of paper "Free Process Rewards without Process Labels"ā143Updated last month
- ā93Updated last month
- ā187Updated 2 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learningā191Updated last month
- ā157Updated 3 weeks ago
- ā149Updated 4 months ago
- An index of algorithms for reinforcement learning from human feedback (rlhf))ā93Updated last year
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"ā139Updated this week
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.ā115Updated last month
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineeringā57Updated 4 months ago
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by gā¦ā34Updated 3 weeks ago
- ā96Updated 9 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".ā52Updated 4 months ago
- RewardBench: the first evaluation tool for reward models.ā555Updated last month
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"ā175Updated last month
- RLHF implementation details of OAI's 2019 codebaseā186Updated last year
- [ICML 2024] Selecting High-Quality Data for Training Language Modelsā168Updated 10 months ago
- Critique-out-Loud Reward Modelsā59Updated 6 months ago
- TokenSkip: Controllable Chain-of-Thought Compression in LLMsā133Updated last month
- Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Modelsā96Updated 8 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervisionā120Updated 7 months ago
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Modelsā256Updated 7 months ago