okarthikb / DPO
Implementation of Direct Preference Optimization
☆16Updated last year
Related projects ⓘ
Alternatives and complementary repositories for DPO
- Minimal but scalable implementation of large language models in JAX☆26Updated 3 weeks ago
- Repository for the code of the "PPL-MCTS: Constrained Textual Generation Through Discriminator-Guided Decoding" paper, NAACL'22☆64Updated 2 years ago
- ☆24Updated 7 months ago
- ☆50Updated 6 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆84Updated 2 weeks ago
- ☆18Updated 2 months ago
- ☆19Updated last month
- ☆44Updated last year
- ☆26Updated last month
- Simple and efficient pytorch-native transformer training and inference (batched)☆61Updated 7 months ago
- ☆55Updated 2 months ago
- ☆45Updated 9 months ago
- ☆53Updated 3 weeks ago
- Directional Preference Alignment☆51Updated 2 months ago
- ☆29Updated this week
- ☆25Updated 3 weeks ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆48Updated 7 months ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆62Updated 5 months ago
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆54Updated 3 months ago
- Efficient Scaling laws and collaborative pretraining.☆13Updated last week
- ☆74Updated 4 months ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆79Updated 10 months ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆37Updated 5 months ago
- ☆34Updated last year
- ☆13Updated 3 months ago
- Triton Implementation of HyperAttention Algorithm☆46Updated 11 months ago
- A library for efficient patching and automatic circuit discovery.☆32Updated last month
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆84Updated 8 months ago
- JAX implementation of the Mistral 7b v0.1 model☆13Updated 7 months ago
- ☆36Updated 3 months ago