XueruiSu / Trust-Region-Preference-ApproximationLinks
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
☆13Updated 4 months ago
Alternatives and similar repositories for Trust-Region-Preference-Approximation
Users that are interested in Trust-Region-Preference-Approximation are comparing it to the libraries listed below
Sorting:
- Code for "Variational Reasoning for Language Models"☆52Updated last month
- Reinforcing General Reasoning without Verifiers☆91Updated 4 months ago
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆57Updated last year
- LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs☆43Updated 3 months ago
- Code accompanying the paper "Noise Contrastive Alignment of Language Models with Explicit Rewards" (NeurIPS 2024)☆57Updated last year
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆67Updated 7 months ago
- [EMNLP 2025] WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning☆52Updated 2 weeks ago
- MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)☆32Updated 3 months ago
- This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…☆30Updated 11 months ago
- ☆33Updated last year
- Directional Preference Alignment☆57Updated last year
- implementation of dualformer☆24Updated 8 months ago
- [NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective☆39Updated last month
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆91Updated last year
- ☆105Updated this week
- ☆45Updated last month
- A Sober Look at Language Model Reasoning☆87Updated last month
- ☆53Updated 8 months ago
- Code for "Reasoning to Learn from Latent Thoughts"☆122Updated 7 months ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆47Updated 3 months ago
- ☆17Updated 3 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆66Updated 8 months ago
- TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25☆79Updated 4 months ago
- MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)☆11Updated 6 months ago
- [ICLR 2025] Code for the paper "Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning"☆78Updated 8 months ago
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆27Updated last month
- Code for "Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective"☆20Updated 2 years ago
- Structured Chemistry Reasoning with Large Language Models☆39Updated last year
- ☆23Updated 11 months ago
- ☆44Updated last month