architsharma97 / dpo-rlaif
☆96Updated 9 months ago
Alternatives and similar repositories for dpo-rlaif:
Users that are interested in dpo-rlaif are comparing it to the libraries listed below
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆44Updated last month
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆119Updated 6 months ago
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆103Updated last year
- ☆163Updated 3 weeks ago
- The official implementation of Self-Exploring Language Models (SELM)☆62Updated 9 months ago
- ☆111Updated last month
- Self-Alignment with Principle-Following Reward Models☆156Updated last year
- Replicating O1 inference-time scaling laws☆83Updated 4 months ago
- ☆103Updated 2 months ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆80Updated last week
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆110Updated 10 months ago
- ☆68Updated 4 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆138Updated 2 weeks ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆73Updated 9 months ago
- [𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…☆48Updated 10 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆131Updated last month
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆73Updated last year
- Critique-out-Loud Reward Models☆56Updated 5 months ago
- Directional Preference Alignment☆56Updated 6 months ago
- Code for "Reasoning to Learn from Latent Thoughts"☆77Updated this week
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆148Updated 4 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆51Updated last month
- ☆136Updated 4 months ago
- ☆119Updated 6 months ago
- ☆84Updated 9 months ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆108Updated 3 weeks ago
- [NeurIPS'24 Spotlight] Observational Scaling Laws☆54Updated 6 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆130Updated 6 months ago
- Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024☆124Updated last month
- ☆59Updated 2 weeks ago