TsinghuaC3I/Unify-Post-Training

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/TsinghuaC3I/Unify-Post-Training)

TsinghuaC3I / Unify-Post-Training

Towards a Unified View of Large Language Model Post-Training

☆211

Alternatives and similar repositories for Unify-Post-Training

Users that are interested in Unify-Post-Training are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TsinghuaC3I / SSRL
View on GitHub
SSRL: Self-Search Reinforcement Learning
☆210Aug 20, 2025Updated 11 months ago
PRIME-RL / Entropy-Mechanism-of-RL
View on GitHub
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
☆445Jul 11, 2025Updated last year
PRIME-RL / RL-Compositionality
View on GitHub
FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
☆68Jan 26, 2026Updated 6 months ago
Simplified-Reasoning / LUFFY
View on GitHub
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆460Mar 20, 2026Updated 4 months ago
TsinghuaC3I / LLM4BioHypoGen
View on GitHub
[COLM 2024] Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation
☆15Jul 15, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
yongliang-wu / DFT
View on GitHub
[ICLR 2026] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.
☆588Jan 4, 2026Updated 6 months ago
TsinghuaC3I / Awesome-RL-for-LRMs
View on GitHub
A Survey of Reinforcement Learning for Large Reasoning Models
☆2,469Nov 9, 2025Updated 8 months ago
zwhong714 / PSFT
View on GitHub
[ICLR 2026] PSFT is a trust-region–inspired fine-tuning objective that views SFT as a policy gradient method with constant advantages, co…
☆38Sep 9, 2025Updated 10 months ago
Xuekai-Zhu / FlowRL
View on GitHub
☆180Nov 24, 2025Updated 8 months ago
PRIME-RL / TTRL
View on GitHub
[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning
☆1,103Apr 15, 2026Updated 3 months ago
GAIR-NLP / OctoThinker
View on GitHub
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆189Jul 23, 2025Updated last year
TsinghuaC3I / ZEDA
View on GitHub
Post-Trained MoE Can Skip Half Experts via Self-Distillation
☆38May 19, 2026Updated 2 months ago
TheRoadQaQ / ReLIFT
View on GitHub
Official Repository of "Learning what reinforcement learning can't"
☆85Dec 30, 2025Updated 6 months ago
FrontisAI / NatureBench
View on GitHub
NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?
☆78Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
suu990901 / KlearReasoner
View on GitHub
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
☆82Dec 25, 2025Updated 7 months ago
ruixin31 / Spurious_Rewards
View on GitHub
☆361Jul 29, 2025Updated 11 months ago
Trae1ounG / BuPO
View on GitHub
[arxiv: 2512.19673] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
☆60Feb 6, 2026Updated 5 months ago
TsinghuaC3I / MARTI
View on GitHub
A Framework for LLM-based Multi-Agent Reinforced Training and Inference
☆540Apr 14, 2026Updated 3 months ago
sail-sg / variational-reasoning
View on GitHub
Code for "Variational Reasoning for Language Models"
☆60Sep 29, 2025Updated 9 months ago
OpenBMB / RLPR
View on GitHub
Extrapolating RLVR to General Domains without Verifiers
☆205Aug 12, 2025Updated 11 months ago
zhengkid / Parallel-R1
View on GitHub
The offical repo for "Parallel-R1: Towards Parallel Thinking via Reinforcement Learning"
☆260Feb 4, 2026Updated 5 months ago
LeapLabTHU / limit-of-RLVR
View on GitHub
repo for paper https://arxiv.org/abs/2504.13837
☆346Dec 17, 2025Updated 7 months ago
RUCAIBox / Passk_Training
View on GitHub
The official repository of paper "Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models''
☆113Aug 15, 2025Updated 11 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
PRIME-RL / P1
View on GitHub
P1: Mastering Physics Olympiads with Reinforcement Learning
☆89Dec 29, 2025Updated 6 months ago
PRIME-RL / PRIME
View on GitHub
Scalable RL solution for advanced reasoning of language models
☆1,865Mar 18, 2025Updated last year
emmyqin / iw_sft
View on GitHub
☆28Jul 18, 2025Updated last year
jinzhuoran / RAG-RewardBench
View on GitHub
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
☆18Dec 19, 2024Updated last year
BytedTsinghua-SIA / DAPO
View on GitHub
An Open-source RL System from ByteDance Seed and Tsinghua AIR
☆1,848May 11, 2025Updated last year
RyanLiu112 / GenPRM
View on GitHub
[AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
☆102Nov 8, 2025Updated 8 months ago
hkgc-1 / GHPO
View on GitHub
☆62Jul 21, 2025Updated last year
hiyouga / EasyR1
View on GitHub
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
☆5,082Updated this week
TIGER-AI-Lab / verl-tool
View on GitHub
A version of verl to support diverse tool use [TMLR 2026]
☆1,024Jul 15, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
agentscope-ai / Trinity-RFT
View on GitHub
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…
☆672Updated this week
llm-in-sandbox / llm-in-sandbox
View on GitHub
Computer Environments Elicit General Agentic Intelligence in LLMs
☆237Updated this week
multimodal-art-projection / REER_DeepWriter
View on GitHub
REverse-Engineered Reasoning for Open-Ended Generation
☆98Sep 10, 2025Updated 10 months ago
TianHongZXY / RLVR-Decomposed
View on GitHub
[NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"
☆166Mar 2, 2026Updated 4 months ago
ltzheng / SimpleTIR
View on GitHub
[ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆401Mar 30, 2026Updated 3 months ago
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,667Updated this week
rllm-org / rllm
View on GitHub
Democratizing Reinforcement Learning for LLMs
☆5,732Updated this week