kuleshov-group / remdmLinks
Remasking Discrete Diffusion Models with Inference-Time Scaling
☆26Updated 3 months ago
Alternatives and similar repositories for remdm
Users that are interested in remdm are comparing it to the libraries listed below
Sorting:
- Official Code Repository for the paper "Continuous Diffusion Model for Language Modeling".☆31Updated 3 months ago
- Official Jax Implementation of MD4 Masked Diffusion Models☆106Updated 4 months ago
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆52Updated 3 months ago
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆108Updated last month
- Official Code for Paper "Think While You Generate: Discrete Diffusion with Planned Denoising" [ICLR 2025]☆67Updated 2 months ago
- Official implementation for our paper "Scaling Diffusion Transformers Efficiently via μP".☆69Updated last month
- Reward fine-tuning for Stable Diffusion models based on stochastic optimal control, including Adjoint Matching☆36Updated 3 weeks ago
- ☆33Updated 3 months ago
- ☆32Updated last month
- This repo is based on https://github.com/jiaweizzhao/GaLore☆28Updated 9 months ago
- The codebase of our paper "Improving the Training of Rectified Flows", NeurIPS 2024☆114Updated 8 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Updated last year
- Code for the paper: "Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods"☆22Updated last month
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆38Updated last month
- ☆17Updated 5 months ago
- A general framework for inference-time scaling and steering of diffusion models with arbitrary rewards.☆156Updated this week
- The official repo of continuous speculative decoding☆27Updated 2 months ago
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆51Updated 6 months ago
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆24Updated 6 months ago
- Official PyTorch implementation for "Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data" (ICLR…☆49Updated 3 weeks ago
- NeuMeta transforms neural networks by allowing a single model to adapt on the fly to different sizes, generating the right weights when n…☆43Updated 7 months ago
- ☆58Updated last week
- ☆32Updated last year
- The official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆35Updated this week
- ☆161Updated this week
- ☆42Updated 7 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆29Updated 7 months ago
- ☆76Updated 4 months ago
- Stick-breaking attention☆57Updated last week
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆27Updated last month