Dao-AILab / gemm-cublasLinks
☆22Updated 5 months ago
Alternatives and similar repositories for gemm-cublas
Users that are interested in gemm-cublas are comparing it to the libraries listed below
Sorting:
- Odysseus: Playground of LLM Sequence Parallelism☆77Updated last year
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆43Updated 3 months ago
- ☆129Updated 4 months ago
- The evaluation framework for training-free sparse attention in LLMs☆100Updated 3 months ago
- ☆50Updated 4 months ago
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆119Updated 3 months ago
- A bunch of kernels that might make stuff slower 😉☆59Updated this week
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆189Updated 3 months ago
- Quantized Attention on GPU☆44Updated 10 months ago
- ☆99Updated 4 months ago
- ☆251Updated 4 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆84Updated 3 weeks ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆73Updated last year
- Transformers components but in Triton☆34Updated 5 months ago
- Awesome Triton Resources☆36Updated 5 months ago
- Benchmark tests supporting the TiledCUDA library.☆17Updated 10 months ago
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆24Updated last week
- Experiment of using Tangent to autodiff triton☆80Updated last year
- FlexAttention w/ FlashAttention3 Support☆27Updated last year
- ring-attention experiments☆152Updated 11 months ago
- ☆32Updated last year
- ☆33Updated this week
- ☆113Updated last year
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆60Updated last year
- Linear Attention Sequence Parallelism (LASP)☆87Updated last year
- Triton implementation of FlashAttention2 that adds Custom Masks.☆138Updated last year
- ☆57Updated last year
- ☆39Updated 2 months ago
- Fast and memory-efficient exact attention☆70Updated 7 months ago
- 🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× …☆86Updated last month