0xallam / Direct-Preference-OptimizationLinks
Direct Preference Optimization from scratch in PyTorch
โ103Updated 4 months ago
Alternatives and similar repositories for Direct-Preference-Optimization
Users that are interested in Direct-Preference-Optimization are comparing it to the libraries listed below
Sorting:
- A Survey on Data Selection for Language Modelsโ245Updated 3 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. ๐งฎโจโ239Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)โ143Updated 5 months ago
- โ278Updated 7 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)โ114Updated last year
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"โ166Updated 2 months ago
- โ269Updated last year
- Critique-out-Loud Reward Modelsโ70Updated 9 months ago
- โ309Updated 2 months ago
- โ203Updated 4 months ago
- โ205Updated 5 months ago
- [ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuningโ474Updated 9 months ago
- RewardBench: the first evaluation tool for reward models.โ622Updated last month
- Project for the paper entitled `Instruction Tuning for Large Language Models: A Survey`โ180Updated 8 months ago
- Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Modelsโ105Updated 2 weeks ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied witโฆโ133Updated last year
- Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"โ504Updated 6 months ago
- Model merging is a highly efficient approach for long-to-short reasoning.โ76Updated 2 months ago
- โ65Updated 3 months ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".โ80Updated 6 months ago
- โ117Updated 4 months ago
- The Paper List on Data Contamination for Large Language Models Evaluation.โ98Updated 2 weeks ago
- [ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuningโ360Updated 11 months ago
- Function Vectors in Large Language Models (ICLR 2024)โ175Updated 3 months ago
- Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)โ206Updated 2 years ago
- [ICML 2024] Selecting High-Quality Data for Training Language Modelsโ183Updated last year
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learningโ234Updated 2 months ago
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Zihaโฆโ128Updated last year
- Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)โ123Updated 3 weeks ago
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correctโ181Updated 6 months ago