tile-ai / tilelang-ascendLinks

Ascend TileLang adapter

☆146

Alternatives and similar repositories for tilelang-ascend

Users that are interested in tilelang-ascend are comparing it to the libraries listed below

Sorting:

DD-DuDa / Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
☆249Updated 4 months ago
OpenPPL / ppl.llm.kernel.cuda
☆152Updated 10 months ago
CalebDu / Awesome-Cute
☆111Updated 6 months ago
reed-lau / cute-gemm
☆144Updated last week
AlibabaPAI / FLASHNN
☆102Updated last year
KnowingNothing / MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
☆393Updated last month
nicolaswilde / cuda-tensorcore-hgemm
☆156Updated 10 months ago
MARD1NO / CUDA-PPT
☆112Updated 7 months ago
Cambricon / triton-linalg
Development repository for the Triton-Linalg conversion
☆204Updated 9 months ago
OpenPPL / ppl.nn.llm
☆139Updated last year
Yinghan-Li / YHs_Sample
Yinghan's Code Sample
☆356Updated 3 years ago
DeepLink-org / DLOP-Bench
A benchmark suited especially for deep learning operators
☆42Updated 2 years ago
weishengying / cutlass_flash_atten_fp8
使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention
☆78Updated last year
sunkx109 / GPUs-Specs
Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM
☆64Updated 3 months ago
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆187Updated last month
ColfaxResearch / cfx-article-src
☆154Updated 6 months ago
AyakaGEMM / Hands-on-GEMM
☆143Updated last year
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆142Updated 10 months ago
nicolaswilde / cuda-sgemm
☆70Updated 10 months ago
flagos-ai / FlagAttention
A collection of memory efficient attention operators implemented in the Triton language.
☆284Updated last year
flagos-ai / FlagGems
FlagGems is an operator library for large language models implemented in the Triton Language.
☆763Updated this week
luliyucoordinate / cute-flash-attention
Implement Flash Attention using Cute.
☆96Updated 11 months ago
66RING / tiny-flash-attention
flash attention tutorial written in python, triton, cuda, cutlass
☆448Updated 6 months ago
KuangjuX / NVSHMEM-Tutorial
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆143Updated 2 months ago
sgl-project / sgl-kernel-npu
SGLang kernel library for NPU
☆73Updated this week
pzhao-eng / FlashMLA
☆59Updated 4 months ago
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆119Updated last month
kwai / Megatron-Kwai
[USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…
☆66Updated last year
tile-ai / tilescale
Tile-based language built for AI computation across all scales
☆80Updated last week
fzyzcjy / torch_memory_saver
Allow torch tensor memory to be released and resumed later
☆167Updated last week