kuleshov-group / remdmLinks
Remasking Discrete Diffusion Models with Inference-Time Scaling
☆34Updated 4 months ago
Alternatives and similar repositories for remdm
Users that are interested in remdm are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆116Updated last week
- Official Code Repository for the paper "Continuous Diffusion Model for Language Modeling".☆38Updated 4 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆27Updated 2 months ago
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆55Updated 4 months ago
- ☆17Updated 6 months ago
- ☆48Updated last month
- ☆33Updated 4 months ago
- ☆32Updated 2 months ago
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆53Updated 7 months ago
- Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"☆81Updated last year
- ☆77Updated 4 months ago
- Large Language Diffusion with Ordered Unmasking☆38Updated last month
- Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models. TMLR 2025.☆86Updated 2 months ago
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆177Updated 3 weeks ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆28Updated last year
- ☆103Updated 2 years ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆91Updated 2 months ago
- Stick-breaking attention☆58Updated 2 weeks ago
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆110Updated 10 months ago
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆28Updated 3 months ago
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆25Updated 6 months ago
- ☆33Updated 6 months ago
- Unofficial Implementation of Selective Attention Transformer☆17Updated 8 months ago
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆242Updated last month
- Official implementation for our paper "Scaling Diffusion Transformers Efficiently via μP".☆77Updated 3 weeks ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆39Updated last week
- Official repository for paper "DeepCritic: Deliberate Critique with Large Language Models"☆32Updated 3 weeks ago
- ☆82Updated 10 months ago
- Implementation of the proposed MaskBit from Bytedance AI☆82Updated 8 months ago
- Official Code for Paper "Think While You Generate: Discrete Diffusion with Planned Denoising" [ICLR 2025]☆68Updated 2 months ago