Optimization-AI / DisCOLinks
Discriminative Constrained Optimization for Reinforcing Large Reasoning Models
☆25Updated 3 weeks ago
Alternatives and similar repositories for DisCO
Users that are interested in DisCO are comparing it to the libraries listed below
Sorting:
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆62Updated 2 months ago
- Lightweight Adapting for Black-Box Large Language Models☆22Updated last year
- Code for "Reasoning to Learn from Latent Thoughts"☆105Updated 2 months ago
- ☆51Updated 2 months ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆42Updated 8 months ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆81Updated 10 months ago
- What Makes a Reward Model a Good Teacher? An Optimization Perspective☆32Updated this week
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆57Updated last year
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆38Updated last month
- [ICLR 2025] Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization☆29Updated 5 months ago
- Rewarded soups official implementation☆58Updated last year
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆37Updated 2 weeks ago
- Directional Preference Alignment☆57Updated 9 months ago
- RENT (Reinforcement Learning via Entropy Minimization) is an unsupervised method for training reasoning LLMs.☆28Updated 3 weeks ago
- Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)☆30Updated last month
- Official Repository of LatentSeek☆49Updated 3 weeks ago
- A Sober Look at Language Model Reasoning☆74Updated last week
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆44Updated 2 months ago
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆32Updated 9 months ago
- ☆40Updated last year
- Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts☆24Updated last year
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆121Updated 9 months ago
- Preprint: Asymmetry in Low-Rank Adapters of Foundation Models☆35Updated last year
- Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"☆72Updated 2 weeks ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆65Updated 2 months ago
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆18Updated 2 months ago
- ☆86Updated last year
- ☆15Updated 2 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆88Updated 8 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆32Updated last year