deepseek-ai/TileKernels

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/deepseek-ai/TileKernels)

deepseek-ai / TileKernels

A kernel library written in tilelang

☆1,642

Alternatives and similar repositories for TileKernels

Users that are interested in TileKernels are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tile-ai / tilelang
View on GitHub
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
☆6,667Updated this week
QwenLM / FlashQLA
View on GitHub
high-performance linear attention kernel library built on TileLang
☆597Updated this week
MoonshotAI / FlashKDA
View on GitHub
FlashKDA: high-performance Kimi Delta Attention kernels
☆455May 26, 2026Updated last month
inclusionAI / cuLA
View on GitHub
CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.
☆533Updated this week
tile-ai / TileOPs
View on GitHub
High-performance LLM operator library built on TileLang.
☆161Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
tile-ai / tilelang-puzzles
View on GitHub
Learning TileLang with 10 puzzles!
☆338May 28, 2026Updated last month
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆731Jul 4, 2026Updated 2 weeks ago
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆1,060Updated this week
Tencent / hpc-ops
View on GitHub
High Performance LLM Inference Operator Library
☆1,036Jul 2, 2026Updated 2 weeks ago
tile-ai / tilescale
View on GitHub
Tile-based language built for AI computation across all scales
☆172Jun 16, 2026Updated last month
sablin39 / tilelang-cuda-skills
View on GitHub
Skills for writing tilelang and debugging with CUDA toolkits.
☆131May 20, 2026Updated 2 months ago
tile-ai / TileRT
View on GitHub
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
☆1,573Updated this week
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆5,983Updated this week
NVIDIA / cutile-python
View on GitHub
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
☆2,118Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,493Jul 11, 2026Updated last week
mit-han-lab / kernel-design-agents
View on GitHub
☆754Jun 2, 2026Updated last month
lightseekorg / tokenspeed
View on GitHub
TokenSpeed is a speed-of-light LLM inference engine.
☆1,629Updated this week
NVIDIA / TileGym
View on GitHub
Helpful kernel tutorials, examples and SKILLs for tile-based GPU programming
☆773Updated this week
NVIDIA / SOL-ExecBench
View on GitHub
A benchmark of real-world DL kernel problems
☆257Updated this week
fla-org / flash-linear-attention
View on GitHub
🚀 Efficient implementations for emerging model architectures
☆5,367Updated this week
sgl-project / mini-sglang
View on GitHub
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
☆4,598May 17, 2026Updated 2 months ago
mit-han-lab / KernelWiki
View on GitHub
☆309Jun 9, 2026Updated last month
uccl-project / mKernel
View on GitHub
mKernel: fast multi-node, multi-GPU fused kernels
☆251Jun 21, 2026Updated 3 weeks ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
BBuf / KDA-Pilot
View on GitHub
☆229Updated this week
dsl-learn / cutile-learn
View on GitHub
NVIDIA cuTile learn
☆169Dec 9, 2025Updated 7 months ago
technillogue / ptx-isa-markdown
View on GitHub
PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.
☆215Dec 24, 2025Updated 6 months ago
mirage-project / mirage
View on GitHub
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
☆2,377Updated this week
open-lm-engine / coda-kernels
View on GitHub
CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs
☆229Updated this week
NVIDIA / tilus
View on GitHub
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
☆489Jul 5, 2026Updated 2 weeks ago
KuangjuX / NVSHMEM-Tutorial
View on GitHub
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆195Feb 11, 2026Updated 5 months ago
NVIDIA / CompileIQ
View on GitHub
An Optimizer for Nvidia Compilers.
☆107Jul 3, 2026Updated 2 weeks ago
radixark / miles
View on GitHub
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
☆1,754Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
THUDM / slime
View on GitHub
slime is an LLM post-training framework for RL Scaling.
☆7,533Updated this week
mlc-ai / modern-gpu-programming-for-mlsys
View on GitHub
A tutorial on modern GPU programming for machine learning systems
☆1,008Updated this week
aikitoria / nanotrace
View on GitHub
Low overhead tracing library and trace visualizer for pipelined CUDA kernels
☆137Updated this week
mit-han-lab / flash-moba
View on GitHub
☆250Nov 19, 2025Updated 8 months ago
mlc-ai / pith-train
View on GitHub
Compact and Agent-Native MoE Training System
☆288Updated this week
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,104Updated this week
deepseek-ai / LPLB
View on GitHub
An early research stage expert-parallel load balancer for MoE models based on linear programming.
☆515Nov 19, 2025Updated 8 months ago