princeton-pli / what-makes-good-rm
What Makes a Reward Model a Good Teacher? An Optimization Perspective
☆15Updated last week
Alternatives and similar repositories for what-makes-good-rm:
Users that are interested in what-makes-good-rm are comparing it to the libraries listed below
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆37Updated 3 weeks ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆72Updated 7 months ago
- Code for "Reasoning to Learn from Latent Thoughts"☆77Updated this week
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆13Updated 9 months ago
- Preprint: Asymmetry in Low-Rank Adapters of Foundation Models☆35Updated last year
- Lightweight Adapting for Black-Box Large Language Models☆22Updated last year
- Rewarded soups official implementation☆56Updated last year
- ☆48Updated 4 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆46Updated 4 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆54Updated 5 months ago
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆28Updated last year
- ☆28Updated last month
- Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"☆62Updated 3 months ago
- Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective☆25Updated 2 months ago
- ☆30Updated 5 months ago
- [NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"☆139Updated 3 weeks ago
- code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning☆39Updated last year
- ☆17Updated last month
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆37Updated 5 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆56Updated last month
- [NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"☆35Updated 2 months ago
- Representation Surgery for Multi-Task Model Merging. ICML, 2024.☆42Updated 5 months ago
- Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts☆24Updated last year
- LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning☆29Updated 11 months ago
- Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts☆16Updated last year
- Code for "Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective"☆19Updated last year
- Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning☆30Updated 4 months ago
- Official code for "Decoding-Time Language Model Alignment with Multiple Objectives".☆19Updated 5 months ago
- [ACL'24, Outstanding Paper] Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!☆34Updated 8 months ago
- This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…☆24Updated 3 months ago