KONAKONA666 / q8_kernelsLinks

☆76

Alternatives and similar repositories for q8_kernels

Users that are interested in q8_kernels are comparing it to the libraries listed below

Sorting:

sandyresearch / chipmunk
🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× …
☆87Updated last month
chengzeyi / piflux
(WIP) Parallel inference for black-forest-labs' FLUX model.
☆18Updated 11 months ago
WaveSpeedAI / QuantumAttention
[WIP] Better (FP8) attention for Hopper
☆33Updated 8 months ago
aredden / torch-cublas-hgemm
PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu
☆75Updated 10 months ago
huggingface / flux-fast
Making Flux go brrr on GPUs.
☆150Updated 3 months ago
timudk / flux_triton
Writing FLUX in Triton
☆41Updated last year
xdit-project / DistVAE
A parallelism VAE avoids OOM for high resolution image generation
☆81Updated 2 months ago
PipeFusion / PipeFusion
A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters
☆52Updated last year
xdit-project / DiTCacheAnalysis
An auxiliary project analysis of the characteristics of KV in DiT Attention.
☆32Updated 11 months ago
mit-han-lab / patch_conv
Patch convolution to avoid large GPU memory usage of Conv2D
☆93Updated 9 months ago
vipshop / cache-dit
A Unified Cache Acceleration Framework for 🤗Diffusers: Qwen-Image-Lightning, Qwen-Image, HunyuanImage, Wan, FLUX, etc.
☆460Updated this week
IST-DASLab / Quartet
☆103Updated this week
Cornell-RelaxML / qtip
☆152Updated 4 months ago
chengzeyi / ParaAttention
https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching
☆385Updated 3 months ago
xdit-project / mochi-xdit
faster parallel inference of mochi-1 video generation model
☆125Updated 8 months ago
sayakpaul / diffusers-torchao
End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).
☆381Updated 5 months ago
microsoft / AttentionEngine
☆102Updated 5 months ago
vllm-project / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆183Updated this week
mit-han-lab / VisCompare
A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders
☆24Updated 8 months ago
chu-tianxiang / QuIP-for-all
QuIP quantization
☆59Updated last year
IST-DASLab / qutlass
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
☆120Updated last week
thu-nics / DiTFastAttn
☆182Updated 9 months ago
czg1225 / AsyncDiff
[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
☆206Updated last month
mit-han-lab / radial-attention
[NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation
☆531Updated last month
fal-ai-community / NativeSparseAttention
research impl of Native Sparse Attention (2502.11089)
☆62Updated 8 months ago
tridao / flash-attention-wheels
☆57Updated last year
nunchaku-tech / deepcompressor
Model Compression Toolbox for Large Language Models and Diffusion Models
☆678Updated 2 months ago
ByteDance-Seed / cudaLLM
☆120Updated 2 months ago
huggingface / diffusion-fast
Faster generation with text-to-image diffusion models.
☆228Updated 4 months ago
hao-ai-lab / Awesome-Video-Attention
A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and cach…
☆41Updated last month