yongliang-wu / DFTLinks
[Preprint] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.
☆498Updated 2 weeks ago
Alternatives and similar repositories for DFT
Users that are interested in DFT are comparing it to the libraries listed below
Sorting:
- ☆205Updated 3 weeks ago
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆146Updated last month
- 📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.☆277Updated 2 weeks ago
- Extrapolating RLVR to General Domains without Verifiers☆179Updated 3 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆381Updated 4 months ago
- Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.☆160Updated last month
- The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning☆330Updated 5 months ago
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆145Updated 7 months ago
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models (NeurIPS 2025)☆167Updated 2 weeks ago
- MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources☆208Updated last month
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆368Updated last month
- 😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, Agent, and Beyond☆315Updated last month
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆257Updated 6 months ago
- ☆212Updated 9 months ago
- Explore the Multimodal “Aha Moment” on 2B Model☆615Updated 8 months ago
- End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆318Updated 2 months ago
- Paper collections of multi-modal LLM for Math/STEM/Code.☆129Updated this week
- ☆213Updated last year
- A version of verl to support diverse tool use☆685Updated last week
- [CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness☆423Updated 6 months ago
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆404Updated last week
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆386Updated 10 months ago
- ☆165Updated last month
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…☆348Updated 2 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆297Updated last year
- ☆309Updated 5 months ago
- One-shot Entropy Minimization☆187Updated 5 months ago
- ☆326Updated 5 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆190Updated 8 months ago
- [EMNLP 2025] TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆189Updated 4 months ago