arnavdantuluri / StableTritonLinks
The first open source triton inference engine for Stable Diffusion, specifically for sdxl
☆12Updated 2 years ago
Alternatives and similar repositories for StableTriton
Users that are interested in StableTriton are comparing it to the libraries listed below
Sorting:
- [ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization☆47Updated last year
- A parallelism VAE avoids OOM for high resolution image generation☆84Updated 4 months ago
- https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching☆396Updated 5 months ago
- Model Compression Toolbox for Large Language Models and Diffusion Models☆706Updated 3 months ago
- [ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.☆365Updated last year
- TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.☆20Updated last year
- High performance inference engine for diffusion models☆96Updated 3 months ago
- The official implementation of PTQD: Accurate Post-Training Quantization for Diffusion Models☆102Updated last year
- [CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models☆713Updated last year
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation☆139Updated 8 months ago
- Faster generation with text-to-image diffusion models.☆231Updated 5 months ago
- stable diffusion, controlnet, tensorrt, accelerate☆58Updated 2 years ago
- ☆187Updated 10 months ago
- Implementation of Post-training Quantization on Diffusion Models (CVPR 2023)☆140Updated 2 years ago
- 🤗A PyTorch-native Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs: Z-Image, FLUX2, Qwen-Image, etc.☆676Updated this week
- A toolkit for developers to simplify the transformation of nn.Module instances. It's now corresponding to Pytorch.fx.☆13Updated 2 years ago
- Patch convolution to avoid large GPU memory usage of Conv2D☆93Updated 10 months ago
- Combining Teacache with xDiT to Accelerate Visual Generation Models☆32Updated 7 months ago
- Real-time inference for Stable Diffusion - 0.88s latency. Covers AITemplate, nvFuser, TensorRT, FlashAttention. Join our Discord communty…☆560Updated 2 years ago
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆32Updated last year
- 📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉☆454Updated last week
- ☆166Updated 2 years ago
- A CUDA kernel for NHWC GroupNorm for PyTorch☆21Updated last year
- 🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× …☆92Updated 3 months ago
- Using TVM to depoly Transformer on CPU and GPU☆11Updated 4 years ago
- QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.☆148Updated 3 months ago
- High Performance Int8 GEMM Kernels for SM80 and later GPUs.☆18Updated 8 months ago
- ☆27Updated 2 years ago
- ☆59Updated 4 months ago
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…☆49Updated 2 years ago