dame-cell / TriformerLinks

Transformers components but in Triton

☆34

Alternatives and similar repositories for Triformer

Users that are interested in Triformer are comparing it to the libraries listed below

Sorting:

feifeibear / ChituAttention
Quantized Attention on GPU
☆44Updated 11 months ago
li-plus / flash-preference
Accelerate LLM preference tuning via prefix sharing with a single line of code
☆51Updated 4 months ago
IBM / triton-dejavu
Framework to reduce autotune overhead to zero for well known deployments.
☆85Updated last month
microsoft / AttentionEngine
☆106Updated 5 months ago
feifeibear / Odysseus-Transformer
Odysseus: Playground of LLM Sequence Parallelism
☆78Updated last year
Aleph-Alpha-Research / NeurIPS-WANT-submission-efficient-parallelization-layouts
☆22Updated last year
Dao-AILab / grouped-latent-attention
☆130Updated 5 months ago
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆85Updated last year
tile-ai / AttentionEngine
☆50Updated 5 months ago
open-lm-engine / accelerated-model-architectures
A bunch of kernels that might make stuff slower 😉
☆64Updated this week
ByteDance-Seed / cudaLLM
☆120Updated 2 months ago
Infini-AI-Lab / MagicDec
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆131Updated 11 months ago
cassiewilliam / cuda_op_benchmark
方便扩展的Cuda算子理解和优化框架，仅用在学习使用
☆18Updated last year
PipeFusion / PipeFusion
A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters
☆51Updated last year
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆206Updated 5 months ago
tilde-research / nsa-impl
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆124Updated 4 months ago
FasterDecoding / TEAL
☆147Updated 9 months ago
ademeure / DeeperGEMM
DeeperGEMM: crazy optimized version
☆73Updated 6 months ago
RLsys-Foundation / TritonForge
🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…
☆98Updated this week
TiledTensor / TiledBench
Benchmark tests supporting the TiledCUDA library.
☆17Updated 11 months ago
HanGuo97 / hilt
☆36Updated last week
feifeibear / DPSKV3MFU
Estimate MFU for DeepSeekV3
☆26Updated 10 months ago
SqueezeAILab / SqueezedAttention
[ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference
☆54Updated 11 months ago
Doraemonzzz / Awesome-Triton-Resources
Awesome Triton Resources
☆36Updated 6 months ago
zhuzilin / flash-attention-with-sink
☆39Updated 3 months ago
tridao / flash-attention-wheels
☆57Updated last year
xiayuqing0622 / flex_head_fa
Fast and memory-efficient exact attention
☆74Updated 8 months ago
sail-sg / SimLayerKV
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
☆50Updated last year
opengear-project / GEAR
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
☆169Updated last year
tgale96 / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆127Updated 5 months ago