idanshen / Self-DistillationLinks
☆70Updated this week
Alternatives and similar repositories for Self-Distillation
Users that are interested in Self-Distillation are comparing it to the libraries listed below
Sorting:
- ☆50Updated 11 months ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆116Updated last week
- Exploration of automated dataset selection approaches at large scales.☆52Updated 10 months ago
- ☆33Updated last year
- Verifiers for LLM Reinforcement Learning☆80Updated 9 months ago
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆46Updated 5 months ago
- ☆75Updated last year
- ☆98Updated 3 weeks ago
- [ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)☆64Updated this week
- When Reasoning Meets Its Laws☆34Updated last month
- [EMNLP'25 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"☆68Updated 9 months ago
- RL Scaling and Test-Time Scaling (ICML'25)☆112Updated last year
- Process Reward Models That Think☆77Updated 2 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆63Updated last year
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆40Updated last year
- Defeating the Training-Inference Mismatch via FP16☆180Updated 2 months ago
- Official Code Release for "Training a Generally Curious Agent"☆44Updated 8 months ago
- Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge☆94Updated 2 weeks ago
- ☆89Updated 3 months ago
- ☆91Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆61Updated last year
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆125Updated last year
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆32Updated 5 months ago
- ☆35Updated 8 months ago
- ☆112Updated last year
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆55Updated 6 months ago
- FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones☆57Updated last week
- Reinforcing General Reasoning without Verifiers☆93Updated 7 months ago
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆94Updated last year
- Aioli: A unified optimization framework for language model data mixing☆32Updated last year