hkproj / dpo-notesLinks
Notes on Direct Preference Optimization
☆23Updated last year
Alternatives and similar repositories for dpo-notes
Users that are interested in dpo-notes are comparing it to the libraries listed below
Sorting:
- minimal GRPO implementation from scratch☆99Updated 8 months ago
- a curated list of the role of small models in the LLM era☆108Updated last year
- Distributed training (multi-node) of a Transformer model☆86Updated last year
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆297Updated this week
- [ICLR 2025 Oral] "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆82Updated last year
- ☆99Updated last year
- This is the official repository for Inheritune.☆115Updated 9 months ago
- Tina: Tiny Reasoning Models via LoRA☆304Updated last month
- Notes and commented code for RLHF (PPO)☆114Updated last year
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆179Updated 7 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆162Updated 7 months ago
- ☆129Updated last year
- SSRL: Self-Search Reinforcement Learning☆151Updated 2 months ago
- ☆98Updated this week
- Verifiers for LLM Reinforcement Learning☆79Updated 7 months ago
- Codebase accompanying the Summary of a Haystack paper.☆79Updated last year
- Survey of Small Language Models from Penn State, ...☆213Updated last week
- ☆94Updated 5 months ago
- ☆48Updated last year
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆109Updated 5 months ago
- NeurIPS 2024 tutorial on LLM Inference☆47Updated 11 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆125Updated 6 months ago
- [Preprint] RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments☆88Updated this week
- Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.☆138Updated last year
- ☆52Updated last year
- Organize the Web: Constructing Domains Enhances Pre-Training Data Curation☆68Updated 6 months ago
- ☆77Updated 2 weeks ago
- Complex Function Calling Benchmark.☆147Updated 9 months ago
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆48Updated 9 months ago
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.☆144Updated 9 months ago