FlagTree / flagtreeLinks

FlagTree is a unified compiler for multiple AI chips, which is forked from triton-lang/triton.

☆129

Alternatives and similar repositories for flagtree

Users that are interested in flagtree are comparing it to the libraries listed below

Sorting:

TiledTensor / TiledCUDA
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆186Updated 9 months ago
OpenPPL / ppl.llm.kernel.cuda
☆152Updated 10 months ago
QianyanTech / NBAssembler
Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.
☆90Updated 2 years ago
MARD1NO / CUDA-PPT
☆110Updated 7 months ago
Cambricon / triton-linalg
Development repository for the Triton-Linalg conversion
☆204Updated 9 months ago
DD-DuDa / Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
☆247Updated 4 months ago
LeiWang1999 / tvm_gpu_gemm
play gemm with tvm
☆92Updated 2 years ago
apache / tvm-rfcs
A home for the final text of all TVM RFCs.
☆109Updated last year
CalebDu / Awesome-Cute
☆108Updated 5 months ago
nicolaswilde / cuda-tensorcore-hgemm
☆156Updated 10 months ago
ColfaxResearch / cfx-article-src
☆154Updated 6 months ago
apache / tvm-ffi
Open ABI and FFI for Machine Learning Systems
☆167Updated this week
xlite-dev / HGEMM
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
☆124Updated 6 months ago
Archermmt / tvm_walk_through
code reading for tvm
☆76Updated 3 years ago
InfiniTensor / InfiniTensor
☆268Updated 2 weeks ago
AlibabaPAI / FLASHNN
☆101Updated last year
Cambricon / torch_mlu
☆44Updated 7 months ago
tlc-pack / cutlass_fpA_intB_gemm
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
☆96Updated 2 months ago
microsoft / triton-shared
Shared Middle-Layer for Triton Compilation
☆302Updated 2 weeks ago
toyaix / TritonLLM
LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model
☆54Updated 3 weeks ago
Ascend / triton-ascend
Triton adapter for Ascend. Mirror of https://gitee.com/ascend/triton-ascend
☆82Updated this week
reed-lau / cute-gemm
☆138Updated 11 months ago
openmlir / mlir-tutorial
Hands-On Practical MLIR Tutorial
☆40Updated 2 months ago
OpenPPL / CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆83Updated 2 years ago
DeepLink-org / DLOP-Bench
A benchmark suited especially for deep learning operators
☆42Updated 2 years ago
gty111 / GEMM_MMA
Optimize GEMM with tensorcore step by step
☆32Updated last year
KnowingNothing / MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
☆390Updated last month
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆183Updated last month
FlagOpen / FlagCX
☆125Updated this week
zeroine / cutlass-cute-sample
☆47Updated last year