hkproj / dpo-notesLinks
Notes on Direct Preference Optimization
☆24Updated last year
Alternatives and similar repositories for dpo-notes
Users that are interested in dpo-notes are comparing it to the libraries listed below
Sorting:
- minimal GRPO implementation from scratch☆102Updated 10 months ago
- Distributed training (multi-node) of a Transformer model☆92Updated last year
- a curated list of the role of small models in the LLM era☆111Updated last year
- Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.☆141Updated last year
- ☆129Updated last year
- [ICLR 2025 Oral] "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆87Updated last year
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆344Updated last month
- [TMLR 2026] When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models☆121Updated 11 months ago
- ☆100Updated last year
- RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct☆31Updated 11 months ago
- Advanced NLP, Spring 2025 https://cmu-l3.github.io/anlp-spring2025/☆71Updated 10 months ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆116Updated this week
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆163Updated 9 months ago
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆188Updated 2 months ago
- NeurIPS 2024 tutorial on LLM Inference☆47Updated last year
- ☆48Updated last year
- ☆139Updated last year
- An extension of the nanoGPT repository for training small MOE models.☆231Updated 10 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆118Updated last year
- ☆89Updated 3 months ago
- ☆53Updated 11 months ago
- ☆42Updated last year
- Tina: Tiny Reasoning Models via LoRA☆313Updated 4 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆61Updated last year
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆58Updated last week
- ☆112Updated last year
- [NeurIPS 2024] Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study☆59Updated last year
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆54Updated last year
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆125Updated last year
- [EMNLP'25 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"☆68Updated 9 months ago