hkproj / dpo-notesLinks
Notes on Direct Preference Optimization
☆19Updated last year
Alternatives and similar repositories for dpo-notes
Users that are interested in dpo-notes are comparing it to the libraries listed below
Sorting:
- a curated list of the role of small models in the LLM era☆100Updated 8 months ago
- Distributed training (multi-node) of a Transformer model☆68Updated last year
- minimal GRPO implementation from scratch☆90Updated 2 months ago
- ☆47Updated 9 months ago
- Advanced NLP, Spring 2025 https://cmu-l3.github.io/anlp-spring2025/☆53Updated 2 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 9 months ago
- Code for "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆73Updated 7 months ago
- ☆120Updated 8 months ago
- ☆59Updated 10 months ago
- DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆54Updated 3 months ago
- Notes and commented code for RLHF (PPO)☆94Updated last year
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆47Updated 4 months ago
- Code for NeurIPS LLM Efficiency Challenge☆58Updated last year
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆107Updated 2 weeks ago
- RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct☆29Updated 3 months ago
- Organize the Web: Constructing Domains Enhances Pre-Training Data Curation☆50Updated last month
- ☆87Updated 8 months ago
- NeurIPS 2024 tutorial on LLM Inference☆45Updated 5 months ago
- Improving Text Embedding of Language Models Using Contrastive Fine-tuning☆64Updated 10 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆248Updated last week
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆173Updated 2 months ago
- Official Implementation of "Reasoning Language Models: A Blueprint"☆62Updated 3 months ago
- Codebase accompanying the Summary of a Haystack paper.☆78Updated 8 months ago
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆54Updated 8 months ago
- Codebase for Instruction Following without Instruction Tuning☆34Updated 8 months ago
- ☆17Updated last month
- ☆64Updated last year
- ☆41Updated last week
- ☆33Updated last month
- Code implementation of synthetic continued pretraining☆110Updated 4 months ago