JinjieNi / MegaDLMsLinks
GPU-optimized framework for training diffusion language models at any scale. The backend of Quokka, Super Data Learners, and OpenMoE 2 training.
☆318Updated 2 months ago
Alternatives and similar repositories for MegaDLMs
Users that are interested in MegaDLMs are comparing it to the libraries listed below
Sorting:
- TraceRL & TraDo-8B: Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models☆398Updated last month
- The most open diffusion language model for code generation — releasing pretraining, evaluation, inference, and checkpoints.☆505Updated 2 months ago
- The official github repo for "Diffusion Language Models are Super Data Learners".☆219Updated 2 months ago
- Easy and Efficient dLLM Fine-Tuning☆203Updated last week
- ☆379Updated 2 months ago
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"☆799Updated 2 months ago
- Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"☆398Updated last month
- dInfer: An Efficient Inference Framework for Diffusion Language Models☆403Updated 3 weeks ago
- Discrete Diffusion Forcing (D2F): dLLMs Can Do Faster-Than-AR Inference☆238Updated last week
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆361Updated 7 months ago
- Block Diffusion for Ultra-Fast Speculative Decoding☆432Updated this week
- d3LLM: Ultra-Fast Diffusion LLM 🚀☆80Updated 2 weeks ago
- Esoteric Language Models☆109Updated 2 months ago
- PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning☆292Updated 2 weeks ago
- Official PyTorch implementation for ICLR2025 paper "Scaling up Masked Diffusion Models on Text"☆360Updated last year
- LLaDA2.0 is the diffusion language model series developed by InclusionAI team, Ant Group.☆236Updated last month
- The official GitHub repo for the survey paper "A Survey on Diffusion Language Models".☆715Updated last week
- Official JAX implementation of End-to-End Test-Time Training for Long Context☆478Updated last week
- 📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.☆348Updated 2 months ago
- ☆137Updated last week
- Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…☆197Updated 2 months ago
- Spectral Sphere Optimizer☆90Updated 2 weeks ago
- implementations and experimentation on mHC by deepseek - https://arxiv.org/abs/2512.24880☆265Updated 3 weeks ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆227Updated 2 months ago
- QeRL enables RL for 32B LLMs on a single H100 GPU.☆477Updated 2 months ago
- ☆110Updated 4 months ago
- A collection of papers on discrete diffusion models☆168Updated 6 months ago
- paper list, tutorial, and nano code snippet for Diffusion Large Language Models.☆152Updated last week
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example☆403Updated 2 months ago
- Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging Supervised Learning and Reinforcement Learning in Math Reasonin…☆67Updated 4 months ago