Libraries-Openly-Fused / FusedKernelLibraryLinks
Implementation of a methodology that allows all sorts of user defined GPU kernel fusion, for non CUDA programmers.
☆23Updated this week
Alternatives and similar repositories for FusedKernelLibrary
Users that are interested in FusedKernelLibrary are comparing it to the libraries listed below
Sorting:
- 🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× …☆84Updated last week
- ☆96Updated 3 months ago
- Quantized Attention on GPU☆44Updated 9 months ago
- The evaluation framework for training-free sparse attention in LLMs☆93Updated 2 months ago
- ☆76Updated 8 months ago
- ☆92Updated 3 weeks ago
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆92Updated last week
- Framework to reduce autotune overhead to zero for well known deployments.☆81Updated 2 weeks ago
- A parallel framework for training deep neural networks☆63Updated 6 months ago
- TritonParse: A Compiler Tracer, Visualizer, and mini-Reproducer Generator(WIP) for Triton Kernels☆150Updated this week
- ☆106Updated last month
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆32Updated 9 months ago
- [WIP] Better (FP8) attention for Hopper