zwhong714/PSFT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zwhong714/PSFT)

zwhong714 / PSFT

[ICLR 2026] PSFT is a trust-region–inspired fine-tuning objective that views SFT as a policy gradient method with constant advantages, constraining policy drift to stabilize training and improve generalization.

☆38

Alternatives and similar repositories for PSFT

Users that are interested in PSFT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

emmyqin / iw_sft
View on GitHub
☆28Jul 18, 2025Updated last year
ZhangXJ199 / EDGE-GRPO
View on GitHub
Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity
☆22Aug 28, 2025Updated 10 months ago
kkk-an / UltraIF
View on GitHub
Code of EMNLP 2025 paper 'UltraIF: Advancing Instruction Following from the Wild'.
☆21Apr 3, 2025Updated last year
ibisbill / Transferability-of-LLM-Reasoning
View on GitHub
☆111Jul 6, 2026Updated 3 weeks ago
Lauorie / DFT
View on GitHub
Reproduced the DFT method without using Verl. https://arxiv.org/abs/2508.05629
☆24Oct 14, 2025Updated 9 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
zhuchichi56 / ASFT
View on GitHub
[ICLR 2026] The official implementation of the paper “Anchored Supervised Fine-Tuning”
☆47Jun 19, 2026Updated last month
AndreHe02 / rewarding-unlikely-release
View on GitHub
☆15Jun 10, 2025Updated last year
TsinghuaC3I / Unify-Post-Training
View on GitHub
Towards a Unified View of Large Language Model Post-Training
☆211Sep 8, 2025Updated 10 months ago
OpenBMB / RLPR
View on GitHub
Extrapolating RLVR to General Domains without Verifiers
☆205Aug 12, 2025Updated 11 months ago
mandyyyyii / east
View on GitHub
☆19Aug 4, 2025Updated 11 months ago
jinhangzhan / RL_Heals_SFT
View on GitHub
☆21Mar 22, 2026Updated 4 months ago
liziniu / GEM
View on GitHub
Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)
☆58May 12, 2025Updated last year
Optimization-AI / DisCO
View on GitHub
NeurIPS 2025: Discriminative Constrained Optimization for Reinforcing Large Reasoning Models
☆53Mar 14, 2026Updated 4 months ago
lime-RL / DCPO
View on GitHub
DCPO: Dynamic Adaptive Clipping for RL
☆49Apr 1, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
chinsengi / dUltra-os
View on GitHub
dUltra: Ultra-Fast Diffusion Large Language Models via Reinforcement Learning
☆16Jul 11, 2026Updated 2 weeks ago
RUCAIBox / Passk_Training
View on GitHub
The official repository of paper "Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models''
☆113Aug 15, 2025Updated 11 months ago
EvanZhuang / mixinputs
View on GitHub
Official implementation for Text Generation Beyond Discrete Token Sampling
☆26Aug 11, 2025Updated 11 months ago
zjr2000 / REVERIE
View on GitHub
[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
☆20Jul 17, 2024Updated 2 years ago
facebookresearch / gen_dgrl
View on GitHub
Official codebase for "The Generalization Gap in Offline Reinforcement Learning" accepted to ICLR 2024
☆29Apr 8, 2026Updated 3 months ago
PRIME-RL / Entropy-Mechanism-of-RL
View on GitHub
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
☆445Jul 11, 2025Updated last year
StarDewXXX / AdaR1
View on GitHub
The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"
☆24May 6, 2026Updated 2 months ago
DerrickYLJ / LessIsMore
View on GitHub
[ICML 2026] Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning
☆34Sep 12, 2025Updated 10 months ago
hahahawu / Long-to-Short-via-Model-Merging
View on GitHub
Model merging is a highly efficient approach for long-to-short reasoning.
☆103Oct 15, 2025Updated 9 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ChnQ / MI-Peaks
View on GitHub
☆68Jul 14, 2025Updated last year
Simplified-Reasoning / LUFFY
View on GitHub
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆460Mar 20, 2026Updated 4 months ago
maple-research-lab / SLOT
View on GitHub
☆112Jun 15, 2025Updated last year
WHU-ZQH / DUP
View on GitHub
☆16Mar 6, 2025Updated last year
kq-chen / VLMEvalKit
View on GitHub
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks
☆15Feb 17, 2025Updated last year
InuyashaYang / AIDIY
View on GitHub
JoinAI是一个开源仓库，专注于算法工程能力的培养，包括工程和数学原理的整理
☆11Apr 20, 2025Updated last year
microsoft / SuperRL
View on GitHub
☆15Sep 8, 2025Updated 10 months ago
chen-hao-chao / dlsm
View on GitHub
[ICLR 2022] Denoising Likelihood Score Matching for Conditional Score-based Data Generation
☆11Jun 15, 2026Updated last month
WooooDyy / BAPO
View on GitHub
Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping…
☆94Jan 29, 2026Updated 5 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Mia-Cong / SWIFT
View on GitHub
Official implementation of "Can Test-Time Scaling Improve World Foundation Model?"
☆15Jul 12, 2025Updated last year
M1n9X / GraphRAG_Lite
View on GitHub
☆16Jul 12, 2024Updated 2 years ago
MikaStars39 / PeRL
View on GitHub
PeRL: Parameter-Efficient Reinforcement Learning
☆82May 20, 2026Updated 2 months ago
THUDM / TreeRL
View on GitHub
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25
☆97Jun 16, 2025Updated last year
CLAIRE-Labo / quantile-reward-policy-optimization
View on GitHub
Official codebase for "Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions" (Matrenok …
☆30Dec 8, 2025Updated 7 months ago
instadeepai / outer-value-function-meta-rl
View on GitHub
Code of the paper: Debiasing Meta-Gradient Reinforcement Learning by Learning the Outer Value Function
☆13Apr 13, 2026Updated 3 months ago
Infini-AI-Lab / Kinetics
View on GitHub
Kinetics: Rethinking Test-Time Scaling Laws
☆87Jul 11, 2025Updated last year