dInfer: An Efficient Inference Framework for Diffusion Language Models
☆423Feb 11, 2026Updated 2 weeks ago
Alternatives and similar repositories for dInfer
Users that are interested in dInfer are comparing it to the libraries listed below
Sorting:
- A lightweight Inference Engine built for block diffusion models☆41Dec 9, 2025Updated 2 months ago
- Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture. Training an MDM using GPT with this repo!☆34Jun 23, 2025Updated 8 months ago
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 5 months ago
- DeeperGEMM: crazy optimized version☆74May 5, 2025Updated 9 months ago
- diffusers with search engine☆11Jan 13, 2026Updated last month
- A forked version of flux-fast that makes flux-fast even faster with cache-dit, 3.3x speedup on NVIDIA L20.☆24Jul 18, 2025Updated 7 months ago
- 📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉☆15Mar 30, 2025Updated 11 months ago
- LLaDA2.0 is the diffusion language model series developed by InclusionAI team, Ant Group.☆352Feb 12, 2026Updated 2 weeks ago
- 武汉大学本科毕设代码--图联邦学习系统设计与实现☆14Jun 5, 2023Updated 2 years ago
- [ICML25] CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale☆25Jul 31, 2025Updated 7 months ago
- Code For Our Work: DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries [ECCV-2024]☆14Jul 11, 2024Updated last year
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆123Dec 25, 2025Updated 2 months ago
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"☆852Jan 28, 2026Updated last month
- [ICLR 2026] Official code for TraceRL: Revolutionizing post-training for Diffusion LLMs, powering the SOTA TraDo series.☆435Jan 28, 2026Updated last month
- ☆15Dec 2, 2019Updated 6 years ago
- EleutherAI ML Performance reading group repository (slides, meeting recordings, annotated papers)☆26Dec 19, 2025Updated 2 months ago
- Official repository of "Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions", ICLR 2024 Sp…☆21Mar 7, 2024Updated last year
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆20Aug 3, 2025Updated 6 months ago
- ☆42Sep 8, 2025Updated 5 months ago
- [ICLR'26] Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs☆97Jan 26, 2026Updated last month
- d3LLM: Ultra-Fast Diffusion LLM 🚀☆93Feb 4, 2026Updated 3 weeks ago
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆27Dec 17, 2024Updated last year
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.☆94Dec 17, 2025Updated 2 months ago
- Official PyTorch implementation for "Large Language Diffusion Models"☆3,609Nov 12, 2025Updated 3 months ago
- High-performance distributed data shuffling (all-to-all) library for MoE training and inference☆112Dec 31, 2025Updated 2 months ago
- Perplexity GPU Kernels☆567Nov 7, 2025Updated 3 months ago
- A practical way of learning Swizzle☆37Feb 3, 2025Updated last year
- ☆52May 19, 2025Updated 9 months ago
- SGEMM optimization with cuda step by step☆21Mar 23, 2024Updated last year
- FlashInfer Bench @ MLSys 2026: Building AI agents to write high performance GPU kernels☆137Feb 9, 2026Updated 2 weeks ago
- A Collection of Papers on Diffusion Language Models☆157Sep 15, 2025Updated 5 months ago
- ☆104Sep 9, 2024Updated last year
- [ICLR 2026] Official repository of "Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models"☆162Feb 16, 2026Updated 2 weeks ago
- Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)☆70Apr 25, 2025Updated 10 months ago
- [arXiv 2024] I4VGen: Image as Free Stepping Stone for Text-to-Video Generation☆24Oct 6, 2024Updated last year
- ☆25Sep 19, 2025Updated 5 months ago
- incubator repo for CUDA-TileIR backend☆106Feb 14, 2026Updated 2 weeks ago
- ☆65Apr 26, 2025Updated 10 months ago
- Model souping for LLMs☆72Nov 18, 2025Updated 3 months ago