MooreThreads/tilelang_musa

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MooreThreads/tilelang_musa)

MooreThreads / tilelang_musa

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

☆56

Alternatives and similar repositories for tilelang_musa

Users that are interested in tilelang_musa are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
MooreThreads / gpu-compute-driver-bench
View on GitHub
☆19Updated this week
tile-ai / TileOPs
View on GitHub
High-performance LLM operator library built on TileLang.
☆163Updated this week
tile-ai / tvm
View on GitHub
Open deep learning compiler stack for cpu, gpu and specialized accelerators
☆20Updated this week
Triang-jyed-driung / i8muon
View on GitHub
Muon in Int8 Precision Made Possible
☆20Jun 18, 2026Updated last month
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
serdes21 / flashtile
View on GitHub
FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.
☆61Feb 6, 2026Updated 5 months ago
MooreThreads / torchada
View on GitHub
An adapter layer that ensures torch_musa🔦 delivers a CUDA-compatible PyTorch experience.
☆37Updated this week
MooreThreads / mutlass
View on GitHub
MUSA Templates for Linear Algebra Subroutines
☆47Jun 22, 2026Updated last month
GeeeekExplorer / kkbot
View on GitHub
A Feishu/Lark AI agent bot
☆15Feb 27, 2026Updated 4 months ago
lcy-seso / DLFrameworkTest
View on GitHub
My tests and experiments with some popular dl frameworks.
☆17Sep 11, 2025Updated 10 months ago
tsinghua-ideal / Syno
View on GitHub
Source code repository for ASPLOS '25 paper "Syno: Structured Synthesis for Neural Operators"
☆15Aug 31, 2025Updated 10 months ago
aikitoria / nanotrace
View on GitHub
Low overhead tracing library and trace visualizer for pipelined CUDA kernels
☆137Jul 17, 2026Updated last week
tile-ai / tilescale
View on GitHub
Tile-based language built for AI computation across all scales
☆176Jun 16, 2026Updated last month
lemyx / tilelang-dsa
View on GitHub
DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang
☆47Nov 19, 2025Updated 8 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
mlc-ai / mlc-python
View on GitHub
☆36Jul 19, 2025Updated last year
RightNow-AI / AutoMegaKernel
View on GitHub
An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch…
☆120Jun 29, 2026Updated 3 weeks ago
flagos-ai / libtriton_jit
View on GitHub
A Triton JIT runtime and ffi provider in C++
☆37Updated this week
tile-ai / AttentionEngine
View on GitHub
☆52May 19, 2025Updated last year
sgl-project / DeepGEMM
View on GitHub
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
☆32Updated this week
tile-ai / tilelang-ascend
View on GitHub
Ascend TileLang adapter
☆338Updated this week
tile-ai / TileFoundry
View on GitHub
☆54Updated this week
inclusionAI / humming
View on GitHub
☆166Updated this week
LeiWang1999 / TVM.CMakeExtend
View on GitHub
Tutorials of Extending and importing TVM with CMAKE Include dependency.
☆16Oct 11, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
thautwarm / LLAST
View on GitHub
A high level LLVM IR AST provider for GraphEngine JIT.
☆22Sep 9, 2018Updated 7 years ago
zhuzilin / flash-attention-with-sink
View on GitHub
☆37Aug 7, 2025Updated 11 months ago
HydraQYH / hp_rms_norm
View on GitHub
High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)
☆30Jan 22, 2026Updated 6 months ago
KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆34Jul 29, 2025Updated 11 months ago
MooreThreads / vllm-musa
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆109Updated this week
tile-ai / tilelang-puzzles
View on GitHub
Learning TileLang with 10 puzzles!
☆349May 28, 2026Updated last month
mlc-ai / tirx-kernels
View on GitHub
ML kernels and benchmarking infrastructure written in TIRx
☆70Updated this week
triton-lang / Triton-to-tile-IR
View on GitHub
incubator repo for CUDA-TileIR backend
☆149Jul 10, 2026Updated 2 weeks ago
flagos-ai / FlagTree
View on GitHub
FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang…
☆303Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
tile-ai / tilelang-benchmark
View on GitHub
☆22Jun 10, 2026Updated last month
NVIDIA / hoti-2025-gpu-comms-tutorial
View on GitHub
Tutorial Exercises and Code for GPU Communications Tutorial at HOT Interconnects 2025
☆32Oct 22, 2025Updated 9 months ago
microsoft / AttentionEngine
View on GitHub
☆123May 19, 2025Updated last year
xlite-dev / qwen-image-fast
View on GitHub
⚡️Qwen-Image 4.8x🎉 speedup with Hybrid Acceleration for low VRAM GPUs
☆17Oct 24, 2025Updated 9 months ago
xiaoyu1998 / llvm-cpu0
View on GitHub
LLVM Backend tutorial Cpu0
☆26Nov 5, 2023Updated 2 years ago
Ascend / triton-ascend
View on GitHub
Triton adapter for Ascend. Mirror of https://gitcode.com/ascend/triton-ascend
☆127May 18, 2026Updated 2 months ago
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year