jt-zhang / Sparse_SageAttention_APILinks
☆20Updated this week
Alternatives and similar repositories for Sparse_SageAttention_API
Users that are interested in Sparse_SageAttention_API are comparing it to the libraries listed below
Sorting:
- 🤗CacheDiT: A Training-free and Easy-to-use Cache Acceleration Toolbox for Diffusion Transformers🔥☆61Updated this week
- Code for Draft Attention☆72Updated last month
- TVMScript kernel for deformable attention☆25Updated 3 years ago
- CVFusion is an open-source deep learning compiler to fuse the OpenCV operators.☆29Updated 2 years ago
- Combining Teacache with xDiT to Accelerate Visual Generation Models☆25Updated 2 months ago
- [ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization☆42Updated 6 months ago
- FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation [Efficient ML Model]☆24Updated 3 weeks ago
- Quantized Attention on GPU☆44Updated 7 months ago
- study of cutlass☆21Updated 7 months ago
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆80Updated last month
- Patch convolution to avoid large GPU memory usage of Conv2D☆88Updated 5 months ago
- ☆167Updated 5 months ago
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- ☆48Updated last month
- A parallelism VAE avoids OOM for high resolution image generation☆64Updated 5 months ago
- ☆29Updated 4 months ago
- ☆21Updated 2 weeks ago
- ☆60Updated 2 months ago
- Implementation of SmoothCache, a project aimed at speeding-up Diffusion Transformer (DiT) based GenAI models with error-guided caching.☆44Updated 3 months ago
- SGEMM optimization with cuda step by step☆19Updated last year
- ☆96Updated 9 months ago
- To pioneer training long-context multi-modal transformer models☆40Updated last week
- Triton adapter for Ascend. Mirror of https://gitee.com/ascend/triton-ascend☆54Updated this week
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆71Updated 10 months ago
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆92Updated 3 weeks ago
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆16Updated 2 weeks ago
- Awesome code, projects, books, etc. related to CUDA☆17Updated last week
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆31Updated 6 months ago
- ☆77Updated last month
- ☆75Updated 5 months ago