flagos-ai / flagtreeLinks

FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang/triton.

☆200

Alternatives and similar repositories for flagtree

Users that are interested in flagtree are comparing it to the libraries listed below

Sorting:

QianyanTech / NBAssembler
Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.
☆95Updated 2 years ago
TiledTensor / TiledCUDA
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆192Updated last year
Cambricon / triton-linalg
Development repository for the Triton-Linalg conversion
☆216Updated last year
MARD1NO / CUDA-PPT
☆118Updated 10 months ago
apache / tvm-ffi
Open ABI and FFI for Machine Learning Systems
☆333Updated this week
OpenPPL / ppl.llm.kernel.cuda
☆152Updated last year
apache / tvm-rfcs
A home for the final text of all TVM RFCs.
☆109Updated last year
galois-stack / galois
a tensor computing compiler based tile programming for gpu, cpu or tpu
☆45Updated last week
toyaix / TritonLLM
LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model
☆64Updated 3 months ago
LeiWang1999 / tvm_gpu_gemm
play gemm with tvm
☆92Updated 2 years ago
bytedance / byteir
A model compilation solution for various hardware
☆463Updated 5 months ago
xlite-dev / HGEMM
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
☆148Updated 8 months ago
openmlir / mlir-tutorial
Hands-On Practical MLIR Tutorial
☆51Updated 5 months ago
MLIR-China / mlir-playground
Play with MLIR right in your browser
☆138Updated 2 years ago
Cambricon / mlu-ops
Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .
☆150Updated 2 weeks ago
InfiniTensor / InfiniTensor
☆285Updated last week
ColfaxResearch / cfx-article-src
☆175Updated 9 months ago
microsoft / triton-shared
Shared Middle-Layer for Triton Compilation
☆325Updated 2 months ago
nicolaswilde / cuda-tensorcore-hgemm
☆158Updated last year
gty111 / GEMM_MMA
Optimize GEMM with tensorcore step by step
☆36Updated 2 years ago
StrongSpoon / tvm.schedule
examples for tvm schedule API
☆101Updated 2 years ago
ArthurinRUC / cutlass-notes
From Minimal GEMM to Everything
☆101Updated last month
tlc-pack / relax
☆192Updated 2 years ago
Ascend / triton-ascend
Triton adapter for Ascend. Mirror of https://gitee.com/ascend/triton-ascend
☆107Updated this week
DeepLink-org / DLCompiler
triton for dsa
☆57Updated last week
Archermmt / tvm_walk_through
code reading for tvm
☆76Updated 4 years ago
gfvvz / triton-learning-materials
Triton Compiler related materials.
☆42Updated last year
JackonYang / hands-on-tvm
hands on model tuning with TVM and profile it on a Mac M1, x86 CPU, and GTX-1080 GPU.
☆49Updated 2 years ago
tile-ai / tilescale
Tile-based language built for AI computation across all scales
☆120Updated this week
AlibabaPAI / FLASHNN
☆105Updated last year