arnavdantuluri / StableTritonLinks
The first open source triton inference engine for Stable Diffusion, specifically for sdxl
☆12Updated last year
Alternatives and similar repositories for StableTriton
Users that are interested in StableTriton are comparing it to the libraries listed below
Sorting:
- [ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization☆44Updated 9 months ago
- https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching☆366Updated last month
- Model Compression Toolbox for Large Language Models and Diffusion Models☆614Updated 3 weeks ago
- A parallelism VAE avoids OOM for high resolution image generation☆76Updated 3 weeks ago
- TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.☆19Updated last year
- 🤗An Unified Cache Acceleration Toolbox for DiTs: Qwen-Image, Qwen-Image-Edit, FLUX.1, Wan2.1/2.2, etc.☆229Updated this week
- [ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.☆354Updated last year
- Faster generation with text-to-image diffusion models.☆226Updated 2 months ago
- High performance inference engine for diffusion models☆81Updated 3 weeks ago
- [CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models☆702Updated 9 months ago
- The official implementation of PTQD: Accurate Post-Training Quantization for Diffusion Models☆101Updated last year
- Combining Teacache with xDiT to Accelerate Visual Generation Models☆31Updated 4 months ago
- Real-time inference for Stable Diffusion - 0.88s latency. Covers AITemplate, nvFuser, TensorRT, FlashAttention. Join our Discord communty…☆560Updated last year
- A toolkit for developers to simplify the transformation of nn.Module instances. It's now corresponding to Pytorch.fx.☆13Updated 2 years ago
- https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.☆1,284Updated 5 months ago
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation☆116Updated 5 months ago
- stable diffusion, controlnet, tensorrt, accelerate☆58Updated 2 years ago
- ☆173Updated 7 months ago
- A CUDA kernel for NHWC GroupNorm for PyTorch☆20Updated 9 months ago
- Implementation of Post-training Quantization on Diffusion Models (CVPR 2023)☆139Updated 2 years ago
- End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).☆378Updated 3 months ago
- ☆75Updated 8 months ago
- A Compressed Stable Diffusion for Efficient Text-to-Image Generation [ECCV'24]☆296Updated last year
- [CVPR 2024] DeepCache: Accelerating Diffusion Models for Free☆926Updated last year
- ☆26Updated 2 years ago
- ☆30Updated 8 months ago
- (WIP) Parallel inference for black-forest-labs' FLUX model.☆19Updated 9 months ago
- NART = NART is not A RunTime, a deep learning inference framework.☆37Updated 2 years ago
- Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x fast…☆274Updated 10 months ago
- [NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising☆205Updated 6 months ago