zhijie-group / DiffulexLinks

Flexible and Pluggable Serving Engine for Diffusion LLMs

☆51

Alternatives and similar repositories for Diffulex

Users that are interested in Diffulex are comparing it to the libraries listed below

Sorting:

yaof20 / Flash-RL
Implementation for FP8/INT8 Rollout for RL training without performence drop.
☆283Updated 2 months ago
inclusionAI / dInfer
dInfer: An Efficient Inference Framework for Diffusion Language Models
☆396Updated 2 weeks ago
OpenSparseLLMs / Linear-MoE
☆127Updated 7 months ago
Dao-AILab / grouped-latent-attention
☆132Updated 7 months ago
ByteDance-Seed / FlexPrefill
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆160Updated 3 months ago
mit-han-lab / fastrl
[ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
☆126Updated last month
maomaocun / dLLM-cache
Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…
☆196Updated 2 months ago
RLsys-Foundation / TritonForge
🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…
☆112Updated 2 months ago
fla-org / flame
🔥 A minimal training framework for scaling FLA models
☆335Updated 2 months ago
thu-ml / ReMoE
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
☆104Updated last year
z-lab / dflash
Block Diffusion for Ultra-Fast Speculative Decoding
☆349Updated 2 weeks ago
Unakar / Spectral-Sphere-Optimizer
Spectral Sphere Optimizer
☆45Updated last week
NVlabs / Fast-dLLM
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
☆790Updated last month
microsoft / SeerAttention
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
☆187Updated 3 months ago
Infini-AI-Lab / Multiverse
☆110Updated 4 months ago
mit-han-lab / flash-moba
☆220Updated 2 months ago
ruipeterpan / specreason
PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]
☆61Updated 3 months ago
smart-lty / ParallelSpeculativeDecoding
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆147Updated 3 weeks ago
NVIDIA-NeMo / Megatron-Bridge
Training library for Megatron-based models with bi-directional Hugging Face conversion capability
☆363Updated this week
fla-org / hybrid-distillation
☆22Updated 3 weeks ago
hao-ai-lab / d3LLM
d3LLM: Ultra-Fast Diffusion LLM 🚀
☆60Updated last week
ISEEKYAN / mbridge
Bridge Megatron-Core to Hugging Face/Reinforcement Learning
☆183Updated last week
andy-yang-1 / DoubleSparse
16-fold memory access reduction with nearly no loss
☆109Updated 9 months ago
NVlabs / COAT
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
☆257Updated 5 months ago
XunhaoLai / native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention.
☆260Updated 7 months ago
thunlp / JustRL
☆196Updated 3 weeks ago
OpenSparseLLMs / MoM
☆115Updated 4 months ago
mdy666 / Qwen-Native-Sparse-Attention
qwen-nsa
☆87Updated 3 months ago
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆110Updated 3 months ago
inclusionAI / dFactory
Easy and Efficient dLLM Fine-Tuning
☆195Updated last month