liumy2010/UFT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/liumy2010/UFT)

liumy2010 / UFT

UFT: Unifying Supervised and Reinforcement Fine-Tuning

☆31

Alternatives and similar repositories for UFT

Users that are interested in UFT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

shiweijiezero / R3L
View on GitHub
☆23Apr 5, 2026Updated 3 months ago
feiyang-k / AutoScale
View on GitHub
Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…
☆14Aug 8, 2025Updated 11 months ago
T-Lab-CUHKSZ / G2RPO-A
View on GitHub
[ACL 2026] G2RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance
☆16May 20, 2026Updated 2 months ago
zhyang2226 / AR-Lopti
View on GitHub
[ICLR 2026] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
☆46May 20, 2025Updated last year
intervention-training / int
View on GitHub
☆16Feb 4, 2026Updated 5 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
TheRoadQaQ / ReLIFT
View on GitHub
Official Repository of "Learning what reinforcement learning can't"
☆85Dec 30, 2025Updated 6 months ago
snu-mllab / Context-Memory
View on GitHub
Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)
☆63Apr 18, 2024Updated 2 years ago
jwhj / OREO
View on GitHub
☆116Jan 21, 2025Updated last year
WujiangXu / EPO
View on GitHub
The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"
☆40Jul 13, 2026Updated last week
Simplified-Reasoning / LUFFY
View on GitHub
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆460Mar 20, 2026Updated 4 months ago
whyNLP / PCCoT
View on GitHub
Parallel Continuous Chain-of-Thought with Jacobi Iteration. Accepted to EMNLP 2025.
☆23Mar 29, 2026Updated 3 months ago
hkgc-1 / GHPO
View on GitHub
☆62Jul 21, 2025Updated last year
zhaoxlpku / PromptCoT
View on GitHub
☆17Apr 10, 2025Updated last year
Linzwcs / AFT
View on GitHub
☆13Jan 22, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
MasterVito / DAC-RL
View on GitHub
Official Repo for DAC-RL: Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability
☆16Feb 26, 2026Updated 5 months ago
microsoft / Simia-Agent-Training
View on GitHub
Official Implementation of "Simulating Environments with Reasoning Models for Agent Training"
☆65Feb 18, 2026Updated 5 months ago
RUCBM / DeepCritic
View on GitHub
Official repository for paper "DeepCritic: Deliberate Critique with Large Language Models"
☆41Jun 24, 2025Updated last year
FreedomIntelligence / PlatoLM
View on GitHub
A trainable user simulator
☆34Jun 30, 2025Updated last year
quchangle1 / MatchTIR
View on GitHub
The implementation for ACL 2026: MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching.
☆20Apr 18, 2026Updated 3 months ago
lemon-prog123 / LongRePS
View on GitHub
Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision
☆19Apr 1, 2025Updated last year
kokolerk / TCOD
View on GitHub
[COLM 2026]TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents
☆86Jul 9, 2026Updated 2 weeks ago
princeton-pli / retaining-by-doing
View on GitHub
☆44Dec 25, 2025Updated 7 months ago
purbeshmitra / MOTIF
View on GitHub
MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs
☆17Jul 6, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
shawnricecake / Heima
View on GitHub
[ICML 2026] Heima
☆75May 20, 2026Updated 2 months ago
pUmpKin-Co / ComplementaryRL
View on GitHub
Co-evolving policy actors and experience extractors for efficient experience-driven agent RL
☆51May 12, 2026Updated 2 months ago
thu-ml / LM-Calibration
View on GitHub
☆17May 31, 2023Updated 3 years ago
aeroplanepaper / GRPO-LEAD
View on GitHub
☆40Nov 18, 2025Updated 8 months ago
abdelfattah-lab / SplitReason
View on GitHub
☆20Mar 18, 2026Updated 4 months ago
Zoeyyao27 / SirLLM
View on GitHub
This repository contains the code for the paper: SirLLM: Streaming Infinite Retentive LLM
☆60May 28, 2024Updated 2 years ago
thu-ml / TetraJet-v2-NVFP4Training
View on GitHub
[ICML 2026 Spotlight] Official implementation of TetraJet-v2: Accurate NVFP4 Training for LLMs, with fully-NVFP4 linear layer with unbias…
☆17Jul 3, 2026Updated 3 weeks ago
yanyanSann / PromptTPP
View on GitHub
PyTorch Implementation of Prompt-augmented Temporal Point Process for Streaming Event Sequence, NeurIPS 2023
☆14Dec 9, 2023Updated 2 years ago
yifeiwang77 / Self-Correction
View on GitHub
☆20Nov 3, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
microsoft / Interactive-Summarization
View on GitHub
The official repo of our research work "Interactive Editing for Text Summarization".
☆23Jun 3, 2023Updated 3 years ago
callsys / GMPO
View on GitHub
[ICLR 2026] Geometric-Mean Policy Optimization
☆104Jan 26, 2026Updated 6 months ago
schinger / AlphaZero
View on GitHub
Simplest AlphaZero Implementation
☆26Nov 6, 2024Updated last year
analokmaus / kaggle-aimo2-fast-math-r1
View on GitHub
Kaggle AIMO2 solution with token-efficient reasoning LLM recipes
☆50Aug 7, 2025Updated 11 months ago
THU-KEG / LongWriter-V
View on GitHub
[ACM MM25] LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models
☆24Mar 29, 2025Updated last year
HaoyueBaiZJU / NAS-OoD
View on GitHub
☆12Nov 18, 2022Updated 3 years ago
qiuzh20 / RMoE
View on GitHub
Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)
☆33Aug 4, 2024Updated last year