Reinforcement Learning via Self-Distillation (SDPO)
☆519Feb 18, 2026Updated 2 weeks ago
Alternatives and similar repositories for SDPO
Users that are interested in SDPO are comparing it to the libraries listed below
Sorting:
- [ICLR 2026] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs☆41May 20, 2025Updated 9 months ago
- ☆355Feb 20, 2026Updated last week
- 16-fold memory access reduction with nearly no loss☆108Mar 26, 2025Updated 11 months ago
- [ACL 2024] ANAH & [NeurIPS 2024] ANAH-v2 & [ICLR 2025] Mask-DPO☆63Apr 30, 2025Updated 10 months ago
- MDPO: Overcoming the Training-Inference Divide of Masked Diffusion Language Models☆40Jan 28, 2026Updated last month
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆94Nov 8, 2025Updated 3 months ago
- Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"☆117Feb 4, 2026Updated last month
- Minimalist RL for Diffusion LLMs with SOTA reasoning performance (89.1% GSM8K). Official implementation of "The Flexibility Trap".☆126Jan 24, 2026Updated last month
- ☆352Jul 29, 2025Updated 7 months ago
- The official implementation of the paper "Self-Updatable Large Language Models by Integrating Context into Model Parameters"☆15May 18, 2025Updated 9 months ago
- [NeurIPS 2024] Efficiency for Free: Ideal Data Are Transportable Representations☆19Jan 19, 2025Updated last year
- Image Tokenizer Needs Post-Training☆24Oct 4, 2025Updated 5 months ago
- [CIKM 2022] Towards Automated Over-Sampling for Imbalanced Classification☆10Mar 20, 2023Updated 2 years ago
- [ICLR 2026] Official repo for "Spotlight on Token Perception for Multimodal Reinforcement Learning"☆49Jan 30, 2026Updated last month
- ☆14Jul 17, 2025Updated 7 months ago
- microRTPS agent side of the microRTPS bridge. Used to interface PX4 with the DDS world through FastRTPS/FastDDS.☆10Sep 17, 2023Updated 2 years ago
- Official codebase for the NeurIPS 2023 paper: Towards Last-layer Retraining for Group Robustness with Fewer Annotations. https://arxiv.or…☆12May 15, 2024Updated last year
- Project of ACL 2025 "UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models"☆14Mar 25, 2025Updated 11 months ago
- SIFT: Grounding LLM Reasoning in Contexts via Stickers☆57Mar 6, 2025Updated 11 months ago
- ☆36Jan 13, 2026Updated last month
- Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation☆28Dec 10, 2025Updated 2 months ago
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆472May 17, 2025Updated 9 months ago
- Official Repo for Open-Reasoner-Zero☆2,087Jun 2, 2025Updated 9 months ago
- ☆30Nov 5, 2024Updated last year
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆92Feb 14, 2025Updated last year
- [ICLR 2026] Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation☆121Feb 15, 2026Updated 2 weeks ago
- ☆15Oct 4, 2024Updated last year
- Use the tokenizer in parallel to achieve superior acceleration☆20Mar 21, 2024Updated last year
- CoV: Chain-of-View Prompting for Spatial Reasoning☆51Jan 23, 2026Updated last month
- ☆18Feb 20, 2025Updated last year
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]☆65Oct 2, 2025Updated 5 months ago
- Simple RL training for reasoning☆3,830Dec 23, 2025Updated 2 months ago
- Official code implementation for the ACL 2025 paper: 'CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis'☆32May 19, 2025Updated 9 months ago
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆222May 31, 2025Updated 9 months ago
- ☆44Feb 13, 2026Updated 2 weeks ago
- Scalable toolkit for efficient model reinforcement☆1,372Updated this week
- Training Large Language Model to Reason in a Continuous Latent Space☆1,522Aug 12, 2025Updated 6 months ago
- Code for Quiet-STaR☆741Aug 21, 2024Updated last year
- [ICLR 2026] RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling☆34Feb 25, 2026Updated last week