SJTU-DENG-Lab / DiffulexLinks
Flexible and Pluggable Serving Engine for Diffusion LLMs
β55Updated last week
Alternatives and similar repositories for Diffulex
Users that are interested in Diffulex are comparing it to the libraries listed below
Sorting:
- Implementation for FP8/INT8 Rollout for RL training without performence drop.β288Updated 2 months ago
- d3LLM: Ultra-Fast Diffusion LLM πβ83Updated 3 weeks ago
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inferenceβ160Updated 3 months ago
- Block Diffusion for Ultra-Fast Speculative Decodingβ459Updated this week
- β129Updated 7 months ago
- β132Updated 8 months ago
- Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cacheβ¦β197Updated 2 months ago
- [NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoningβ63Updated 3 months ago
- π₯ LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilationβ¦β116Updated 2 months ago
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"β816Updated last week
- β23Updated last month
- dInfer: An Efficient Inference Framework for Diffusion Language Modelsβ410Updated 3 weeks ago
- Efficient triton implementation of Native Sparse Attention.β262Updated 8 months ago
- β221Updated 2 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoringβ267Updated 6 months ago
- [ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafterβ130Updated 2 months ago
- π₯ A minimal training framework for scaling FLA modelsβ343Updated 2 months ago
- The evaluation framework for training-free sparse attention in LLMsβ114Updated last week
- β110Updated 4 months ago
- NexRL is an ultra-loosely-coupled LLM post-training framework.β97Updated this week
- Training library for Megatron-based models with bidirectional Hugging Face conversion capabilityβ400Updated this week
- A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithmβ¦β99Updated 5 months ago
- Spectral Sphere Optimizerβ90Updated 3 weeks ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Trainingβ258Updated 5 months ago
- Discrete Diffusion Forcing (D2F): dLLMs Can Do Faster-Than-AR Inferenceβ238Updated 2 weeks ago
- Triton implementation of FlashAttention2 that adds Custom Masks.β165Updated last year
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMsβ188Updated 4 months ago
- Kinetics: Rethinking Test-Time Scaling Lawsβ85Updated 6 months ago
- qwen-nsaβ87Updated 3 months ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMsβ203Updated 2 months ago