yongliang-wu / DFTView external linksLinks
[ICLR 2026] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.
☆536Jan 4, 2026Updated last month
Alternatives and similar repositories for DFT
Users that are interested in DFT are comparing it to the libraries listed below
Sorting:
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization☆81Dec 25, 2025Updated last month
- A Sober Look at Language Model Reasoning☆92Nov 18, 2025Updated 2 months ago
- verl: Volcano Engine Reinforcement Learning for LLMs☆19,132Updated this week
- ☆352Jul 29, 2025Updated 6 months ago
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example☆408Nov 21, 2025Updated 2 months ago
- ☆16Jul 29, 2025Updated 6 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆188Jun 25, 2025Updated 7 months ago
- [NeurIPS 2025] TTRL: Test-Time Reinforcement Learning☆989Sep 26, 2025Updated 4 months ago
- ☆13Sep 12, 2024Updated last year
- An Ultra-Long Output Reinforcement Learning Approach☆23Jul 31, 2025Updated 6 months ago
- Code Repository for Blog - How to Productionize Large Language Models (LLMs)☆12Mar 27, 2024Updated last year
- ☆13May 23, 2025Updated 8 months ago
- Simple RL training for reasoning☆3,827Dec 23, 2025Updated last month
- Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.☆840May 14, 2025Updated 9 months ago
- ✨✨ [ICLR 2026] R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning☆281May 9, 2025Updated 9 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆420Jul 11, 2025Updated 7 months ago
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆701Oct 15, 2025Updated 4 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆416Oct 4, 2025Updated 4 months ago
- ☆263May 14, 2025Updated 9 months ago
- Please visit https://thuhcsi.github.io/SnakeGAN/☆37Apr 25, 2023Updated 2 years ago
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)☆8,989Feb 6, 2026Updated last week
- Streamable Text-to-Speech model using a language modeling approach, without vector quantization☆110May 20, 2025Updated 8 months ago
- Code repository for "RL Grokking Recipe: How RL Unlocks and Transfers New Algorithms in LLMs""☆29Oct 12, 2025Updated 4 months ago
- The official implementation of the paper “Anchored Supervised Fine-Tuning”☆30Updated this week
- ScalingOpt - Optimization Community☆78Feb 4, 2026Updated last week
- ☆23Nov 20, 2021Updated 4 years ago
- RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.☆2,512Jan 25, 2026Updated 3 weeks ago
- The Code for Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models☆17Oct 4, 2024Updated last year
- A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning☆283Sep 25, 2025Updated 4 months ago
- A Tiny Project For ASR model training and Deployment☆26Oct 14, 2022Updated 3 years ago
- [NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward☆945Feb 16, 2025Updated last year
- Control LLM☆22Apr 6, 2025Updated 10 months ago
- [ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning☆511Oct 20, 2024Updated last year
- Extrapolating RLVR to General Domains without Verifiers☆201Aug 12, 2025Updated 6 months ago
- [ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆354Jan 12, 2026Updated last month
- Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)☆691Sep 24, 2025Updated 4 months ago
- LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation☆33Oct 11, 2025Updated 4 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]☆218Nov 27, 2025Updated 2 months ago
- poorman's ar-dit tts☆45Dec 31, 2025Updated last month