hao-ai-lab / d3LLMLinks
d3LLM: Ultra-Fast Diffusion LLM π
β33Updated this week
Alternatives and similar repositories for d3LLM
Users that are interested in d3LLM are comparing it to the libraries listed below
Sorting:
- [NeurIPS'25] dKV-Cache: The Cache for Diffusion Language Modelsβ123Updated 6 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoringβ256Updated 5 months ago
- β93Updated last week
- β207Updated 3 weeks ago
- Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cacheβ¦β186Updated last month
- Discrete Diffusion Forcing (D2F): dLLMs Can Do Faster-Than-AR Inferenceβ214Updated 2 months ago
- SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable SparseβLinear Attentionβ146Updated last month
- β62Updated 5 months ago
- A lightweight Inference Engine built for block diffusion modelsβ37Updated last week
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inferenceβ158Updated 2 months ago
- [ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafterβ105Updated last week
- A sparse attention kernel supporting mix sparse patternsβ406Updated 10 months ago
- [ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsityβ64Updated 5 months ago
- A Collection of Papers on Diffusion Language Modelsβ149Updated 3 months ago
- β31Updated 3 months ago
- β87Updated 6 months ago
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]β60Updated 2 months ago
- A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and cachβ¦β51Updated last month
- [NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identificationβ31Updated 8 months ago
- βοΈ [ICCV 2025] Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraintsβ78Updated 5 months ago
- The official implementation of dLLM-Varβ26Updated last month
- Locality-aware Parallel Decoding for Efficient Autoregressive Image Generationβ80Updated 5 months ago
- paper list, tutorial, and nano code snippet for Diffusion Large Language Models.β139Updated 5 months ago
- Efficient triton implementation of Native Sparse Attention.β254Updated 6 months ago
- An efficient implementation of the NSA (Native Sparse Attention) kernelβ126Updated 5 months ago
- β187Updated 11 months ago
- [NeurIPS'25] The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Tokβ¦β67Updated this week
- The official repository for the paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"β122Updated 3 weeks ago
- A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attentionβ252Updated 2 weeks ago
- [NeurIPS 2025] VeriThinker: Learning to Verify Makes Reasoning Model Efficientβ63Updated 2 months ago