xdit-project / xDiT
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
☆714Updated this week
Related projects ⓘ
Alternatives and complementary repositories for xDiT
- [CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models☆590Updated 2 weeks ago
- Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-t…☆403Updated this week
- 📒A small curated list of Awesome Diffusion Inference Papers with codes.☆96Updated this week
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆357Updated this week
- Model Compression Toolbox for Large Language Models and Diffusion Models☆222Updated last week
- Ring attention implementation with flash attention☆585Updated last week
- [CVPR 2024] DeepCache: Accelerating Diffusion Models for Free☆800Updated 4 months ago
- SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models☆327Updated this week
- [ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.☆331Updated 8 months ago
- Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.☆1,186Updated 4 months ago
- FlashInfer: Kernel Library for LLM Serving☆1,452Updated this week
- Disaggregated serving system for Large Language Models (LLMs).☆359Updated 3 months ago
- ☆100Updated last month
- QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving☆443Updated last week
- Zero Bubble Pipeline Parallelism☆281Updated last week
- [EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…☆322Updated this week
- A fast communication-overlapping library for tensor parallelism on GPUs.☆224Updated 3 weeks ago
- FlagGems is an operator library for large language models implemented in Triton Language.☆342Updated this week
- ☆289Updated 7 months ago
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…☆311Updated 2 months ago
- [NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.☆391Updated 3 months ago
- A PyTorch Native LLM Training Framework☆665Updated 2 months ago
- A throughput-oriented high-performance serving framework for LLMs☆636Updated 2 months ago
- mllm-npu: training multimodal large language models on Ascend NPUs☆83Updated 2 months ago
- A parallelism VAE avoids OOM for high resolution image generation☆40Updated last month
- Dynamic Memory Management for Serving LLMs without PagedAttention☆238Updated last week
- HART: Efficient Visual Generation with Hybrid Autoregressive Transformer☆340Updated last month
- FlagScale is a large model toolkit based on open-sourced projects.☆169Updated this week
- VideoSys: An easy and efficient system for video generation☆1,775Updated this week
- InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencie…☆310Updated this week