xdit-project / DistVAE
A parallelism VAE avoids OOM for high resolution image generation
☆53Updated 3 weeks ago
Alternatives and similar repositories for DistVAE:
Users that are interested in DistVAE are comparing it to the libraries listed below
- ☆144Updated last month
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆25Updated 2 months ago
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation☆55Updated last week
- Patch convolution to avoid large GPU memory usage of Conv2D☆85Updated 3 weeks ago
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆39Updated 6 months ago
- Accelerating Diffusion Transformers with Token-wise Feature Caching☆62Updated 2 weeks ago
- [ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization☆32Updated 2 months ago
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders☆19Updated 3 weeks ago
- ☆87Updated this week
- [NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching☆92Updated 7 months ago
- A sparse attention kernel supporting mix sparse patterns☆133Updated last week
- Adaptive Caching for Faster Video Generation with Diffusion Transformers☆142Updated 3 months ago
- mllm-npu: training multimodal large language models on Ascend NPUs☆90Updated 5 months ago
- Context parallel attention that accelerates DiT model inference with dynamic caching☆189Updated this week
- Model Compression Toolbox for Large Language Models and Diffusion Models☆330Updated this week
- 📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉☆189Updated last month
- (WIP) Parallel inference for black-forest-labs' FLUX model.☆17Updated 3 months ago
- FORA introduces simple yet effective caching mechanism in Diffusion Transformer Architecture for faster inference sampling.☆35Updated 7 months ago
- Quantized Attention on GPU☆34Updated 2 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆89Updated this week
- Official implementation of the ICLR 2024 paper AffineQuant☆24Updated 10 months ago
- 📚 Collection of awesome generation acceleration resources.☆139Updated this week
- The official implementation of PTQD: Accurate Post-Training Quantization for Diffusion Models☆94Updated 11 months ago
- 16-fold memory access reduction with nearly no loss☆77Updated this week
- ☆61Updated 3 weeks ago
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆52Updated 2 weeks ago
- ☆59Updated last month
- QuEST: Efficient Finetuning for Low-bit Diffusion Models☆38Updated 3 weeks ago