triton-lang / Triton-to-tile-IRView external linksLinks
incubator repo for CUDA-TileIR backend
☆102Updated this week
Alternatives and similar repositories for Triton-to-tile-IR
Users that are interested in Triton-to-tile-IR are comparing it to the libraries listed below
Sorting:
- Shared Middle-Layer for Triton Compilation☆329Dec 5, 2025Updated 2 months ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆194Updated this week
- SBLP 2025 MLIR Tutorial☆69Feb 8, 2026Updated last week
- Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.☆84Jan 27, 2026Updated 3 weeks ago
- a size profiler for cuda binary☆72Jan 15, 2026Updated last month
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆443Feb 4, 2026Updated last week
- ☆104Nov 7, 2024Updated last year
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆466Dec 31, 2025Updated last month
- Helpful kernel tutorials and examples for tile-based GPU programming☆641Updated this week
- Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration☆33Jan 8, 2026Updated last month
- diffusers with search engine☆12Jan 13, 2026Updated last month
- High-performance distributed data shuffling (all-to-all) library for MoE training and inference☆112Dec 31, 2025Updated last month
- triton for dsa☆57Jan 30, 2026Updated 2 weeks ago
- This is the proof-of-concept CPU implementation of ASPEN used for the NeurIPS'23 paper ASPEN: Breaking Operator Barriers for Efficient Pa…☆13Apr 4, 2024Updated last year
- ☆18Mar 4, 2025Updated 11 months ago
- A Triton-only attention backend for vLLM☆23Updated this week
- An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.☆51Jul 23, 2024Updated last year
- Github mirror of trition-lang/triton repo.☆137Updated this week
- Repository holding the code base to AC-SpGEMM : "Adaptive Sparse Matrix-Matrix Multiplication on the GPU"☆31Jul 7, 2020Updated 5 years ago
- IREE's PyTorch Frontend, based on Torch Dynamo.☆105Updated this week
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆326Updated this week
- cuTile is a programming model for writing parallel kernels for NVIDIA GPUs☆1,926Updated this week
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Jun 21, 2019Updated 6 years ago
- Accelerating MoE with IO and Tile-aware Optimizations☆583Feb 6, 2026Updated last week
- ☆32Jul 17, 2024Updated last year
- ☆18May 8, 2021Updated 4 years ago
- A lightweight triton-based General Matrix Multiplication (GEMM) library.☆44Updated this week
- CUDA Template Functions☆20Dec 16, 2025Updated 2 months ago
- ☆16Sep 24, 2024Updated last year
- Wave: Python Domain-Specific Language for High Performance Machine Learning☆44Updated this week
- HeteroHalide: From Image Processing DSL to Efficient FPGA Acceleration☆15Sep 14, 2020Updated 5 years ago
- Noisy language compiler☆17Jul 31, 2024Updated last year
- CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-base…☆829Jan 14, 2026Updated last month
- Tile-Based Runtime for Ultra-Low-Latency LLM Inference☆567Jan 26, 2026Updated 3 weeks ago
- Collection of kernels written in Triton language☆177Jan 27, 2026Updated 3 weeks ago
- Distributed Compiler based on Triton for Parallel Systems☆1,358Updated this week
- A Triton JIT runtime and ffi provider in C++☆31Jan 26, 2026Updated 3 weeks ago
- DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang☆43Nov 19, 2025Updated 2 months ago
- ☆18Oct 15, 2020Updated 5 years ago