zwhong714 / weak-to-strong-preference-optimizationLinks

[ICLR 2025 Spotlight] Weak-to-strong preference optimization: stealing reward from weak aligned model

☆15

Alternatives and similar repositories for weak-to-strong-preference-optimization

Users that are interested in weak-to-strong-preference-optimization are comparing it to the libraries listed below

Sorting:

wang8740 / MAP
Documentation at
☆13Updated 8 months ago
yihuaihong / Dissecting-FT-Unlearning
☆14Updated last year
horseee / CoT-Valve
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆87Updated 9 months ago
Raibows / CREAM
Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.
☆27Updated 9 months ago
StarDewXXX / AdaR1
The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"
☆20Updated 3 weeks ago
AI45Lab / REEF
The repository of the paper "REEF: Representation Encoding Fingerprints for Large Language Models," aims to protect the IP of open-source…
☆68Updated 10 months ago
ZJU-REAL / EasySteer
A Unified Framework for High-Performance and Extensible LLM Steering
☆131Updated last week
sail-sg / ActivePRM
☆19Updated 7 months ago
SophieZheng998 / ALI-Agent
Official implementation for "ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation"
☆21Updated 3 months ago
Joshua-Ren / Learning_dynamics_LLM
☆184Updated 6 months ago
ZhentingWang / DUMP
☆32Updated 6 months ago
sail-sg / AnytimeReasoner
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
☆47Updated 4 months ago
wizard-III / ArcherCodeR
ArcherCodeR is an open-source initiative enhancing code reasoning in large language models through scalable, rule-governed reinforcement …
☆43Updated 3 months ago
shivamag125 / EM_PT
☆24Updated 3 months ago
sail-sg / Attention-Sink
[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)
☆142Updated 4 months ago
GaryStack / MMR-V
Official repository of the video reasoning benchmark MMR-V. Can Your MLLMs "Think with Video"?
☆36Updated 5 months ago
aeroplanepaper / GRPO-LEAD
☆30Updated last week
sail-sg / feedback-conditional-policy
Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"
☆52Updated 2 months ago
Jihuai-wpy / InferAligner
☆37Updated last year
NineAbyss / S2R
This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
☆72Updated 7 months ago
eric-ai-lab / MSSBench
[ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"
☆30Updated 5 months ago
QingyangZhang / EMPO
[NeurIPS25 Spotlight] EMPO, A Fully Unsupervised RLVR Method
☆84Updated this week
ChnQ / MI-Peaks
☆56Updated 4 months ago
limenlp / verl
AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
☆48Updated 5 months ago
lzhxmu / CPPO
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models (NeurIPS 2025)
☆167Updated 3 weeks ago
kokolerk / TON
[NeurIPS 2025] Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
☆51Updated 2 months ago
Hongcheng-Gao / HAVEN
Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".
☆22Updated last month
UCSB-NLP-Chang / ThinkPrune
☆45Updated 2 months ago
MingyuJ666 / Rope_with_LLM
[ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…
☆82Updated 5 months ago
alenai97 / PEFT-MLLM
Official Code and data for ACL 2024 finding, "An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models"
☆23Updated last year