NVlabs / Fast-dLLMLinks
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
☆233Updated 2 weeks ago
Alternatives and similar repositories for Fast-dLLM
Users that are interested in Fast-dLLM are comparing it to the libraries listed below
Sorting:
- Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…☆113Updated last week
- XAttention: Block Sparse Attention with Antidiagonal Scoring☆166Updated this week
- 🔥 A minimal training framework for scaling FLA models☆178Updated 2 weeks ago
- A sparse attention kernel supporting mix sparse patterns☆238Updated 4 months ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆210Updated last week
- ☆85Updated 2 months ago
- [ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization☆71Updated 3 weeks ago
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆131Updated 2 months ago
- Efficient triton implementation of Native Sparse Attention.☆168Updated last month
- ☆82Updated last month
- Efficient Mixture of Experts for LLM Paper List☆77Updated 6 months ago
- ☆167Updated 5 months ago
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆113Updated last month
- VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework☆355Updated last month
- ☆191Updated 2 months ago
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs☆128Updated this week
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆81Updated 6 months ago
- ☆104Updated 2 weeks ago
- paper list, tutorial, and nano code snippet for Diffusion Large Language Models.☆59Updated 3 weeks ago
- qwen-nsa☆67Updated 2 months ago
- ✈️ Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints☆69Updated 2 months ago
- ☆152Updated last week
- ☆45Updated last week
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆213Updated 3 weeks ago
- A Collection of Papers on Diffusion Language Models☆81Updated last week
- [NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching☆105Updated 11 months ago
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation☆102Updated 3 months ago
- 16-fold memory access reduction with nearly no loss☆99Updated 3 months ago
- LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification☆54Updated 3 months ago
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆123Updated last year