zwhong714 / weak-to-strong-preference-optimizationLinks
[ICLR 2025 Spotlight] Weak-to-strong preference optimization: stealing reward from weak aligned model
☆14Updated 7 months ago
Alternatives and similar repositories for weak-to-strong-preference-optimization
Users that are interested in weak-to-strong-preference-optimization are comparing it to the libraries listed below
Sorting:
- ☆14Updated last year
- ☆20Updated 8 months ago
- ☆19Updated 6 months ago
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆26Updated 8 months ago
- The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"☆19Updated last month
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆86Updated 8 months ago
- ☆30Updated 5 months ago
- Documentation at☆12Updated 6 months ago
- ☆15Updated 6 months ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆69Updated 5 months ago
- [NeurIPS25] Official repo for "Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning"☆33Updated 2 weeks ago
- Official repository of the video reasoning benchmark MMR-V. Can Your MLLMs "Think with Video"?☆36Updated 3 months ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆47Updated 3 months ago
- ☆21Updated last month
- ☆27Updated last month
- ☆18Updated 3 months ago
- ☆25Updated 6 months ago
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆26Updated 3 months ago
- ☆44Updated 3 weeks ago
- Official implementation for "ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation"☆20Updated 2 months ago
- The repository of the paper "REEF: Representation Encoding Fingerprints for Large Language Models," aims to protect the IP of open-source…☆65Updated 9 months ago
- [ICML'25] Official code of paper "Fast Large Language Model Collaborative Decoding via Speculation"☆28Updated 3 months ago
- ☆36Updated last week
- [NeurIPS 2025] Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆47Updated 3 weeks ago
- From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.☆23Updated last week
- [2025-TMLR] A Survey on the Honesty of Large Language Models☆60Updated 10 months ago
- [ACL 2025 Findings] Official implementation of the paper "Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning".☆19Updated 7 months ago
- ☆22Updated 5 months ago
- Open-Pandora: On-the-fly Control Video Generation☆34Updated 10 months ago
- Github repo for NeurIPS 2024 paper "Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models"☆21Updated last month