zhijie-group / Discrete-Diffusion-ForcingLinks

Discrete Diffusion Forcing (D2F): dLLMs Can Do Faster-Than-AR Inference

☆234

Alternatives and similar repositories for Discrete-Diffusion-Forcing

Users that are interested in Discrete-Diffusion-Forcing are comparing it to the libraries listed below

Sorting:

maomaocun / dLLM-cache
Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…
☆193Updated last month
OpenMOSS / DiRL
☆126Updated last week
Gen-Verse / dLLM-RL
TraceRL & TraDo-8B: Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models
☆384Updated 3 weeks ago
pengzhangzhi / Open-dLLM
The most open diffusion language model for code generation — releasing pretraining, evaluation, inference, and checkpoints.
☆499Updated 2 months ago
NVlabs / Fast-dLLM
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
☆773Updated last month
horseee / dKV-Cache
[NeurIPS'25] dKV-Cache: The Cache for Diffusion Language Models
☆128Updated 7 months ago
XunhaoLai / native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention.
☆258Updated 7 months ago
NVlabs / GatedDeltaNet
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆421Updated 3 months ago
mit-han-lab / flash-moba
☆216Updated last month
mit-han-lab / x-attention
[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring
☆263Updated 6 months ago
inclusionAI / dFactory
Easy and Efficient dLLM Fine-Tuning
☆194Updated 3 weeks ago
mit-han-lab / Block-Sparse-Attention
A sparse attention kernel supporting mix sparse patterns
☆430Updated this week
yczhou001 / Awesome-Diffusion-LLM
paper list, tutorial, and nano code snippet for Diffusion Large Language Models.
☆148Updated 6 months ago
HKUNLP / DiffuLLaMA
[ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models
☆359Updated 7 months ago
thu-ml / SLA
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention
☆233Updated 2 weeks ago
TsinghuaC3I / Fourier-Position-Embedding
[ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization
☆105Updated 7 months ago
HanGuo97 / log-linear-attention
☆265Updated 7 months ago
dllm-reasoning / d1
Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"
☆393Updated 3 weeks ago
OpenSparseLLMs / MoM
☆115Updated 3 months ago
thu-ml / ReMoE
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
☆104Updated last year
ML-GSAI / Diffusion-LLM-Papers
A Collection of Papers on Diffusion Language Models
☆149Updated 3 months ago
liangyuwang / Tiny-FSDP
Tiny-FSDP, a minimalistic re-implementation of the PyTorch FSDP
☆93Updated 4 months ago
maple-research-lab / LLaDOU
Implementation of "Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models" [NeurIPS 2025]
☆69Updated 3 weeks ago
fla-org / flame
🔥 A minimal training framework for scaling FLA models
☆333Updated last month
svg-project / flash-kmeans
Fast and memory-efficient exact kmeans
☆133Updated 2 months ago
prathebaselva / FORA
FORA introduces simple yet effective caching mechanism in Diffusion Transformer Architecture for faster inference sampling.
☆52Updated last year
z-lab / sparselora
[ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
☆65Updated 6 months ago
SandAI-org / MagiAttention
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
☆607Updated this week
NVlabs / Long-RL
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆683Updated 3 months ago
thu-nics / DiTFastAttn
☆189Updated last year