latentCall145 / channels-last-groupnormLinks
A CUDA kernel for NHWC GroupNorm for PyTorch
☆20Updated 9 months ago
Alternatives and similar repositories for channels-last-groupnorm
Users that are interested in channels-last-groupnorm are comparing it to the libraries listed below
Sorting:
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆40Updated 2 months ago
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆19Updated 3 weeks ago
- Quantized Attention on GPU☆44Updated 9 months ago
- ☆31Updated 6 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆111Updated 11 months ago
- ☆61Updated 4 months ago
- ☆50Updated 3 months ago
- GPTQ inference TVM kernel☆40Updated last year
- A practical way of learning Swizzle☆25Updated 6 months ago
- ☆91Updated last week
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆94Updated 3 weeks ago
- ☆81Updated 7 months ago
- An auxiliary project analysis of the characteristics of KV in DiT Attention.