Ascend/triton-ascend

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Ascend/triton-ascend)

Ascend / triton-ascend

Triton adapter for Ascend. Mirror of https://gitcode.com/ascend/triton-ascend

☆127

Alternatives and similar repositories for triton-ascend

Users that are interested in triton-ascend are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tile-ai / tilelang-ascend
View on GitHub
Ascend TileLang adapter
☆335Updated this week
flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated last year
tile-ai / AttentionEngine
View on GitHub
☆52May 19, 2025Updated last year
ademeure / DeeperGEMM
View on GitHub
DeeperGEMM: crazy optimized version
☆86May 5, 2025Updated last year
sgl-project / sgl-kernel-npu
View on GitHub
SGLang kernel library for NPU
☆166Updated this week
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
Ascend / torchair
View on GitHub
☆26Jun 8, 2026Updated last month
cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
YJMSTR / flash-linear-attention
View on GitHub
FLA but cuTile
☆27Apr 17, 2026Updated 3 months ago
flashinfer-ai / cubloaty
View on GitHub
a size profiler for cuda binary
☆71Jan 15, 2026Updated 6 months ago
serdes21 / flashtile
View on GitHub
FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.
☆61Feb 6, 2026Updated 5 months ago
Ascend / triton-ascend-ops
View on GitHub
☆22Jun 29, 2026Updated 3 weeks ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,495Updated this week
tile-ai / TileOPs
View on GitHub
High-performance LLM operator library built on TileLang.
☆161Updated this week
microsoft / AttentionEngine
View on GitHub
☆123May 19, 2025Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
microsoft / TileFusion
View on GitHub
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆115Jun 28, 2025Updated last year
flagos-ai / libtriton_jit
View on GitHub
A Triton JIT runtime and ffi provider in C++
☆37Updated this week
DeepLink-org / DLCompiler
View on GitHub
triton for dsa
☆68Jul 10, 2026Updated last week
infinigence / FlashOverlap
View on GitHub
A lightweight design for computation-communication overlap.
☆242Jan 20, 2026Updated 6 months ago
Ascend / pytorch
View on GitHub
Ascend PyTorch adapter (torch_npu). Mirror of https://gitcode.com/Ascend/pytorch
☆552Updated this week
AlibabaPAI / FLASHNN
View on GitHub
☆106Sep 9, 2024Updated last year
triton-lang / Triton-to-tile-IR
View on GitHub
incubator repo for CUDA-TileIR backend
☆148Jul 10, 2026Updated last week
pytorch / helion
View on GitHub
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆912Updated this week
microsoft / triton-shared
View on GitHub
Shared Middle-Layer for Triton Compilation
☆340Dec 5, 2025Updated 7 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆34Jul 29, 2025Updated 11 months ago
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,344Aug 28, 2025Updated 10 months ago
zhuzilin / flash-attention-with-sink
View on GitHub
☆37Aug 7, 2025Updated 11 months ago
apache / tvm-ffi
View on GitHub
Open ABI and FFI for Machine Learning Systems
☆435Updated this week
dsl-learn / cutile-learn
View on GitHub
NVIDIA cuTile learn
☆169Dec 9, 2025Updated 7 months ago
tile-ai / tilelang-mlir-ascend
View on GitHub
MLIR-based TileLang Ascend Adapter
☆21Updated this week
flagos-ai / FlagGems
View on GitHub
FlagGems is an operator library for large language models implemented in the Triton Language.
☆1,053Updated this week
flagos-ai / FlagTree
View on GitHub
FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang…
☆299Updated this week
triton-lang / triton-ext
View on GitHub
A collection of out-of-tree extensions for the Triton language and compiler
☆30Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
tile-ai / tilescale
View on GitHub
Tile-based language built for AI computation across all scales
☆174Jun 16, 2026Updated last month
feifeibear / ChituAttention
View on GitHub
Quantized Attention on GPU
☆45Nov 22, 2024Updated last year
meta-pytorch / tritonbench
View on GitHub
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆362Updated this week
Cambricon / triton-linalg
View on GitHub
Development repository for the Triton-Linalg conversion
☆221Feb 7, 2025Updated last year
NVIDIA / nvshmem
View on GitHub
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…
☆561Updated this week
nicolaswilde / amx-gemm-handwritten
View on GitHub
Handwritten GEMM using Intel AMX (Advanced Matrix Extension)
☆17Jan 11, 2025Updated last year
antgroup / DeepXTrace
View on GitHub
DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.
☆100Jan 16, 2026Updated 6 months ago