leloykun / flash-hyperbolic-attention-minimal

Flash Hyperbolic Attention in ~[...] lines of CUDA

☆21

Alternatives and similar repositories for flash-hyperbolic-attention-minimal:

Users that are interested in flash-hyperbolic-attention-minimal are comparing it to the libraries listed below

IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆72Updated 7 months ago
stanford-futuredata / stk
☆103Updated 7 months ago
xiayuqing0622 / flex_head_fa
Fast and memory-efficient exact attention
☆67Updated last month
gpu-mode / triton-index
Cataloging released Triton kernels.
☆213Updated 3 months ago
Deep-Learning-Profiling-Tools / triton-viz
☆195Updated last week
IBM / triton-dejavu
Framework to reduce autotune overhead to zero for well known deployments.
☆63Updated last week
FasterDecoding / TEAL
☆122Updated last month
pytorch-labs / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆108Updated this week
triton-lang / kernels
☆76Updated 5 months ago
PiotrNawrot / nano-sparse-attention
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
☆59Updated 2 months ago
tgale96 / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆78Updated 5 months ago
shadowpa0327 / Palu
[ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection
☆95Updated last month
fla-org / flame
🔥 A minimal training framework for scaling FLA models
☆94Updated last week
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆210Updated 4 months ago
Infini-AI-Lab / Sirius
Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its…
☆21Updated 6 months ago
pytorch-labs / applied-ai
Applied AI experiments and examples for PyTorch
☆253Updated 2 weeks ago
dame-cell / Triformer
Transformers components but in Triton
☆32Updated 3 weeks ago
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆40Updated last year
ScalingIntelligence / KernelBench
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems
☆254Updated last week
andylolu2 / simpleGEMM
The simplest but fast implementation of matrix multiplication in CUDA.
☆34Updated 8 months ago
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆118Updated this week
Dao-AILab / fast-hadamard-transform
Fast Hadamard transform in CUDA, with a PyTorch interface
☆166Updated 10 months ago
DerrickYLJ / TidalDecode
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆32Updated last week
wangsiping97 / FastGEMV
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
☆103Updated 8 months ago
Cornell-RelaxML / qtip
☆114Updated 2 weeks ago
ColfaxResearch / cutlass-kernels
☆194Updated 8 months ago
usyd-fsalab / fp6_llm
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
☆245Updated 5 months ago
gpu-mode / ring-attention
ring-attention experiments
☆129Updated 5 months ago
zankner / Hydra
☆43Updated last year
sustcsonglin / fla-tilelang
☆19Updated last month