zwhong714 / weak-to-strong-preference-optimizationLinks
[ICLR 2025 Spotlight] Weak-to-strong preference optimization: stealing reward from weak aligned model
☆15Updated 8 months ago
Alternatives and similar repositories for weak-to-strong-preference-optimization
Users that are interested in weak-to-strong-preference-optimization are comparing it to the libraries listed below
Sorting:
- Documentation at☆12Updated 7 months ago
- The repository of the paper "REEF: Representation Encoding Fingerprints for Large Language Models," aims to protect the IP of open-source…☆66Updated 9 months ago
- ☆19Updated 6 months ago
- ☆22Updated 2 months ago
- codes for Efficient Test-Time Scaling via Self-Calibration☆18Updated last month
- Official repository of the video reasoning benchmark MMR-V. Can Your MLLMs "Think with Video"?☆36Updated 4 months ago
- [NeurIPS25 Spotlight] EMPO, A Fully Unsupervised RLVR Method☆74Updated 2 weeks ago
- ☆14Updated last year
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆86Updated 8 months ago
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆27Updated 8 months ago
- ☆22Updated 5 months ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆47Updated 3 months ago
- ☆32Updated 6 months ago
- Open-Pandora: On-the-fly Control Video Generation☆35Updated 11 months ago
- The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"☆20Updated last month
- ☆20Updated 8 months ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆46Updated last year
- V1: Toward Multimodal Reasoning by Designing Auxiliary Task☆36Updated 6 months ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆48Updated last year
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆28Updated 4 months ago
- Official implementation for "ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation"☆21Updated 3 months ago
- ☆16Updated 7 months ago
- Github repo for NeurIPS 2024 paper "Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models"☆22Updated last month
- ☆179Updated 5 months ago
- [ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration R…☆107Updated 4 months ago
- ☆22Updated last week
- The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"☆20Updated 6 months ago
- ☆30Updated last month
- ECSO (Make MLLM safe without neither training nor any external models!) (https://arxiv.org/abs/2403.09572)☆33Updated last year
- [NeurIPS25] Official repo for "Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning"☆34Updated last month