Crys-Chen / DPadLinks
Official implementation of "DPad: Efficient Diffusion Language Models with Suffix Dropout"
☆50Updated last month
Alternatives and similar repositories for DPad
Users that are interested in DPad are comparing it to the libraries listed below
Sorting:
- Implementation for FP8/INT8 Rollout for RL training without performence drop.☆260Updated 2 weeks ago
- Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…☆164Updated last month
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆143Updated this week
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆241Updated 2 months ago
- [ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity☆58Updated 3 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring☆236Updated 3 months ago
- A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention☆189Updated last month
- A sparse attention kernel supporting mix sparse patterns