dllm-reasoning / d1Links
Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"
☆213Updated last week
Alternatives and similar repositories for d1
Users that are interested in d1 are comparing it to the libraries listed below
Sorting:
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆213Updated 3 weeks ago
- Official PyTorch implementation for ICLR2025 paper "Scaling up Masked Diffusion Models on Text"☆230Updated 6 months ago
- ☆292Updated last week
- [NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"☆160Updated 3 months ago
- official repository for “Reinforcement Learning for Reasoning in Large Language Models with One Training Example”☆290Updated last week
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆222Updated last month
- ☆152Updated last week
- ☆203Updated 4 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆191Updated this week
- [NeurIPS 2024] Simple and Effective Masked Diffusion Language Model☆433Updated 3 weeks ago
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆175Updated this week
- repo for paper https://arxiv.org/abs/2504.13837☆158Updated last month
- A brief and partial summary of RLHF algorithms.☆129Updated 3 months ago
- ☆190Updated 2 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆94Updated 3 months ago
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆383Updated 2 weeks ago
- Repo of paper "Free Process Rewards without Process Labels"☆153Updated 3 months ago
- Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training☆281Updated last month
- ☆220Updated last month
- ☆300Updated 3 weeks ago
- A version of verl to support tool use☆251Updated last week
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆221Updated last month
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆238Updated last month
- Paper List of Inference/Test Time Scaling/Computing☆264Updated last week
- AnchorAttention: Improved attention for LLMs long-context training☆208Updated 5 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains☆141Updated 2 weeks ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆159Updated 3 weeks ago
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"☆233Updated 2 weeks ago
- Code for the paper: "Learning to Reason without External Rewards"☆295Updated last week
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆131Updated 2 months ago