[ICLR 2026] PSFT is a trust-region–inspired fine-tuning objective that views SFT as a policy gradient method with constant advantages, constraining policy drift to stabilize training and improve generalization.
☆36Sep 9, 2025Updated 5 months ago
Alternatives and similar repositories for PSFT
Users that are interested in PSFT are comparing it to the libraries listed below
Sorting:
- ☆27Jul 18, 2025Updated 7 months ago
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆31Jun 5, 2025Updated 9 months ago
- The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"☆22Nov 9, 2025Updated 3 months ago
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…☆32Apr 20, 2024Updated last year
- Official implementation for DenseMixer: Improving MoE Post-Training with Precise Router Gradient☆66Aug 3, 2025Updated 7 months ago
- Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"☆11Apr 15, 2024Updated last year
- Test-Time Label-Shift Adaptation☆13May 24, 2023Updated 2 years ago
- Generate Quiz Question from PDF/Text files☆11Feb 2, 2024Updated 2 years ago
- [ICCV 2025 DeepID Challenge] Official 1st Place in both tracks (Detection & Localization)☆17Dec 24, 2025Updated 2 months ago
- Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity☆22Aug 28, 2025Updated 6 months ago
- Compute training dynamics, plot data cartography, analysing data quality...☆42Nov 10, 2022Updated 3 years ago
- ☆10Oct 20, 2023Updated 2 years ago
- PyTorch implementation of Bezier simplex fitting☆12Feb 28, 2026Updated last week
- Code accompanying the 2022 DLS paper "Misleading Deep-Fake Detection with GAN Fingerprints"☆10May 26, 2022Updated 3 years ago
- Official implementation for Text Generation Beyond Discrete Token Sampling☆21Aug 11, 2025Updated 6 months ago
- BFloat16 Fused Adam Operator for PyTorch☆16Nov 16, 2024Updated last year
- VLAD-VSA: Cross-Domain Face Presentation Attack Detection with Vocabulary Separation and Adaptation☆13Mar 4, 2022Updated 4 years ago
- Ling-Coder-Lite is a MoE LLM provided and open-sourced by CodeFuse and InclusionAI.☆14Apr 22, 2025Updated 10 months ago
- ☆13Mar 25, 2022Updated 3 years ago
- ☆18May 3, 2025Updated 10 months ago
- This code was written quite some time ago for the purpose of processing the NGSIM dataset. While it might not be the epitome of organizat…☆10Oct 5, 2023Updated 2 years ago
- COVID-19 corpus with annotated biomedical entities.☆11Jun 2, 2021Updated 4 years ago
- Code for reproducing our paper: LMSOC: An Approach for Socially Sensitive Pretraining☆13Oct 22, 2021Updated 4 years ago
- Build a Slurm Cluster using SaltStack in virtual machines☆12Nov 26, 2018Updated 7 years ago
- ☆12Apr 25, 2025Updated 10 months ago
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 7 months ago
- Python bindings for NVIDIA CUDA APIs.☆13Mar 2, 2024Updated 2 years ago
- ALAS: Autonomous Learning Agent System☆15Aug 14, 2025Updated 6 months ago
- ☆10Nov 1, 2019Updated 6 years ago
- a script from ERNIE1.0 or ERNIE2.0 to transfomers' BERT format☆10Mar 28, 2020Updated 5 years ago
- ☆14Aug 13, 2025Updated 6 months ago
- 2022 秋季学期清华大学电子系数据与算法课程 OJ 参考解答☆10Jun 18, 2023Updated 2 years ago
- 本项目提供了面向中文的XLNet预训练模型,旨在丰富中文自然语言处理资源,提供多元化的中文预训练模型选择。 我们欢迎各位专家学者下载使用,并共同促进和发展中文资源建设。☆11May 30, 2023Updated 2 years ago
- A Label Mark tool using for deep learning☆12Jun 9, 2018Updated 7 years ago
- ☆10Sep 18, 2021Updated 4 years ago
- ☆20Jul 23, 2025Updated 7 months ago
- Towards a Unified View of Large Language Model Post-Training☆204Sep 8, 2025Updated 5 months ago
- Training Vision Transformers for Semi-Supervised Semantic Segmentation☆14Nov 3, 2025Updated 4 months ago
- 2023龙芯杯mips赛道作品☆14Dec 23, 2023Updated 2 years ago