okarthikb / DPO
Implementation of Direct Preference Optimization
☆15Updated last year
Alternatives and similar repositories for DPO:
Users that are interested in DPO are comparing it to the libraries listed below
- Minimal but scalable implementation of large language models in JAX☆32Updated 4 months ago
- ☆25Updated 10 months ago
- ☆28Updated 2 months ago
- Learn online intrinsic rewards from LLM feedback☆34Updated 2 months ago
- ☆51Updated 9 months ago
- ☆28Updated last month
- ☆13Updated 3 months ago
- Efficient Scaling laws and collaborative pretraining.☆15Updated last month
- ☆80Updated 11 months ago
- ☆21Updated 5 months ago
- [ICLR 2025] "Training LMs on Synthetic Edit Sequences Improves Code Synthesis" (Piterbarg, Pinto, Fergus)☆15Updated 2 weeks ago
- The repository contains code for Adaptive Data Optimization☆20Updated 2 months ago
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆31Updated 3 months ago
- This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity☆40Updated last year
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆52Updated 11 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- ☆42Updated last year
- Triton Implementation of HyperAttention Algorithm☆47Updated last year
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆79Updated last year
- ☆28Updated last year
- ☆12Updated 11 months ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Updated 8 months ago
- ☆17Updated 7 months ago
- ☆44Updated last year
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆26Updated 9 months ago
- The official implementation of Self-Exploring Language Models (SELM)☆61Updated 9 months ago
- ☆73Updated 6 months ago
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…☆38Updated last month
- Using FlexAttention to compute attention with different masking patterns☆41Updated 5 months ago