NVIDIA / tilusLinks

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

☆408

Alternatives and similar repositories for tilus

Users that are interested in tilus are comparing it to the libraries listed below

Sorting:

HazyResearch / Megakernels
kernels, of the mega variety
☆614Updated 2 months ago
pytorch / helion
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆640Updated this week
NVIDIA / nvshmem
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…
☆402Updated 3 weeks ago
meta-pytorch / tritonparse
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
☆175Updated last week
HazyResearch / HipKittens
Fast and Furious AMD Kernels
☆309Updated last week
triton-lang / triton-cpu
An experimental CPU backend for Triton
☆164Updated 3 weeks ago
triton-lang / kernels
☆94Updated last year
ROCm / iris
AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming
☆119Updated last week
Deep-Learning-Profiling-Tools / triton-viz
☆256Updated last week
Dao-AILab / quack
A Quirky Assortment of CuTe Kernels
☆675Updated last week
meta-pytorch / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆294Updated last week
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆401Updated 2 weeks ago
gpu-mode / reference-kernels
Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!
☆164Updated this week
cchan / tccl
extensible collectives library in triton
☆91Updated 8 months ago
apache / tvm-ffi
Open ABI and FFI for Machine Learning Systems
☆211Updated this week
pranjalssh / fast.cu
Fastest kernels written from scratch
☆400Updated 2 months ago
perplexityai / pplx-kernels
Perplexity GPU Kernels
☆534Updated 3 weeks ago
facebookexperimental / triton
Github mirror of trition-lang/triton repo.
☆100Updated this week
meta-pytorch / BackendBench
Ship correct and fast LLM kernels to PyTorch
☆124Updated 3 weeks ago
openxla / shardy
MLIR-based partitioning system
☆151Updated this week
IST-DASLab / qutlass
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
☆143Updated 3 weeks ago
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆172Updated 8 months ago
microsoft / TileFusion
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆102Updated 5 months ago
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆112Updated 10 months ago
ColfaxResearch / layout-categories
This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".
☆81Updated 2 months ago
bertmaher / simplegemm
☆127Updated last month
wangsiping97 / FastGEMV
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
☆123Updated last year
ROCm / aiter
AI Tensor Engine for ROCm
☆309Updated this week
meta-pytorch / applied-ai
Applied AI experiments and examples for PyTorch
☆308Updated 3 months ago
microsoft / triton-shared
Shared Middle-Layer for Triton Compilation
☆316Updated last month