KuangjuX / AttnLinkLinks

An experimental communicating attention kernel based on DeepEP.

☆34

Alternatives and similar repositories for AttnLink

Users that are interested in AttnLink are comparing it to the libraries listed below

Sorting:

flashinfer-ai / cutlass-viz
☆65Updated 7 months ago
ademeure / DeeperGEMM
DeeperGEMM: crazy optimized version
☆73Updated 6 months ago
tile-ai / AttentionEngine
☆51Updated 6 months ago
cherichy / tilecute
☆31Updated 5 months ago
ademeure / cuda-side-boost
☆51Updated 6 months ago
KuangjuX / NVSHMEM-Tutorial
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆144Updated 2 months ago
tile-ai / tilescale
Tile-based language built for AI computation across all scales
☆82Updated this week
IBM / triton-dejavu
Framework to reduce autotune overhead to zero for well known deployments.
☆88Updated 2 months ago
lemyx / tilelang-dsa
DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang
☆32Updated 2 weeks ago
flashinfer-ai / debug-print
Debug print operator for cudagraph debugging
☆14Updated last year
microsoft / TileFusion
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆102Updated 5 months ago
tile-ai / TileOPs
☆60Updated last week
zhuzilin / flash-attention-with-sink
☆39Updated 3 months ago
ACA-Lab-SJTU / token-ring
☆13Updated 10 months ago
feifeibear / ChituAttention
Quantized Attention on GPU
☆44Updated last year
ByteDance-Seed / cudaLLM
☆125Updated 3 months ago
CalvinXKY / mfu_calculation
A simple calculation for LLM MFU.
☆50Updated 2 months ago
antgroup / DeepXTrace
DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.
☆70Updated last week
microsoft / AttentionEngine
☆113Updated 6 months ago
microsoft / tokenweave
Efficient Compute-Communication Overlap for Distributed LLM Inference
☆63Updated last month
nex-agi / NexVenusCL
Nex Venus Communication Library
☆59Updated 2 weeks ago
flagos-ai / libtriton_jit
A Triton JIT runtime and ffi provider in C++
☆29Updated last month
infinigence / HamiltonAttention
☆34Updated last month
PipeFusion / PipeFusion
A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters
☆52Updated last year
Chtholly-Boss / swizzle
A practical way of learning Swizzle
☆33Updated 10 months ago
LeiWang1999 / Stream-k.tvm
☆19Updated last year
Ascend / triton-ascend
Triton adapter for Ascend. Mirror of https://gitee.com/ascend/triton-ascend
☆86Updated this week
HPMLL / NVIDIA-Hopper-Benchmark
☆65Updated 6 months ago
luliyucoordinate / cute-flash-attention
Implement Flash Attention using Cute.
☆97Updated 11 months ago
Bruce-Lee-LY / decoding_attention
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
☆45Updated 5 months ago