PKU-Alignment / alignerLinks

[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct

☆188

Alternatives and similar repositories for aligner

Users that are interested in aligner are comparing it to the libraries listed below

Sorting:

Vance0124 / Token-level-Direct-Preference-Optimization
Reference implementation for Token-level Direct Preference Optimization(TDPO)
☆148Updated 8 months ago
sail-sg / CPO
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
☆130Updated 7 months ago
TianHongZXY / RLVR-Decomposed
[NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"
☆115Updated 2 months ago
kanishkg / cognitive-behaviors
☆210Updated 7 months ago
WooooDyy / MathCritique
Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".
☆56Updated 10 months ago
rookie-joe / AutoPSV
☆50Updated 11 months ago
SparkJiao / dpo-trajectory-reasoning
[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".
☆82Updated 9 months ago
GAIR-NLP / LIMR
☆211Updated 8 months ago
Yifan-Song793 / ETO
Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)
☆151Updated 11 months ago
PRIME-RL / ImplicitPRM
Repo of paper "Free Process Rewards without Process Labels"
☆164Updated 7 months ago
CJReinforce / PURE
Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"
☆138Updated 3 months ago
THU-KEG / RM-Bench
[ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
☆64Updated 3 months ago
yubol-bobo / Awesome-Multi-Turn-LLMs
This is the official GitHub repository for our survey paper "Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language …
☆125Updated 5 months ago
liziniu / ReMax
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
☆194Updated last year
SihengLi99 / LLM-Honesty-Survey
[2025-TMLR] A Survey on the Honesty of Large Language Models
☆61Updated 10 months ago
WeiminXiong / IPR
Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)
☆62Updated last year
ZHZisZZ / modpo
[ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
☆91Updated last year
RyanLiu112 / Awesome-Process-Reward-Models
A comprehensive collection of process reward models.
☆115Updated 3 weeks ago
QwenLM / ProcessBench
Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"
☆174Updated 5 months ago
hahahawu / Long-to-Short-via-Model-Merging
Model merging is a highly efficient approach for long-to-short reasoning.
☆89Updated last week
IAAR-Shanghai / xVerify
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
☆136Updated 6 months ago
Tim-Siu / reft-exp
A research repo for experiments about Reinforcement Finetuning
☆52Updated 6 months ago
YuxiXie / MCTS-DPO
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
☆327Updated last year
ssmisya / PRMBench
[ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.
☆81Updated 8 months ago
RLHFlow / RAFT
This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…
☆37Updated last year
tongyx361 / Awesome-LLM4Math
Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…
☆143Updated last year
Zanette-Labs / efficient-reasoning
☆67Updated 6 months ago
CMU-AIRe / MRT
Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".
☆112Updated 2 months ago
PRIME-RL / Entropy-Mechanism-of-RL
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
☆360Updated 3 months ago
Zhou-Zoey / RMB-Reward-Model-Benchmark
☆43Updated 7 months ago