incubator repo for CUDA-TileIR backend
☆112Mar 4, 2026Updated this week
Alternatives and similar repositories for Triton-to-tile-IR
Users that are interested in Triton-to-tile-IR are comparing it to the libraries listed below
Sorting:
- Shared Middle-Layer for Triton Compilation☆331Dec 5, 2025Updated 3 months ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆196Updated this week
- SBLP 2025 MLIR Tutorial☆70Feb 8, 2026Updated last month
- Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.☆84Feb 26, 2026Updated last week
- a size profiler for cuda binary☆72Jan 15, 2026Updated last month
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆446Updated this week
- ☆105Nov 7, 2024Updated last year
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆471Feb 28, 2026Updated last week
- Helpful kernel tutorials and examples for tile-based GPU programming☆659Mar 3, 2026Updated last week
- Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration☆36Jan 8, 2026Updated 2 months ago
- diffusers with search engine☆12Jan 13, 2026Updated last month
- CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-base…☆860Feb 24, 2026Updated 2 weeks ago
- High-performance distributed data shuffling (all-to-all) library for MoE training and inference☆112Updated this week
- triton for dsa☆58Mar 3, 2026Updated last week
- cuTile is a programming model for writing parallel kernels for NVIDIA GPUs☆1,951Mar 3, 2026Updated last week
- TensaLang is a Tensor-first programming language, compiler, and runtime that let you write the Model’s inference engine (e.g. LLMs) and s…☆71Feb 20, 2026Updated 2 weeks ago
- This is the proof-of-concept CPU implementation of ASPEN used for the NeurIPS'23 paper ASPEN: Breaking Operator Barriers for Efficient Pa…☆13Apr 4, 2024Updated last year
- ☆18Mar 4, 2025Updated last year
- An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.☆50Jul 23, 2024Updated last year
- Repository holding the code base to AC-SpGEMM : "Adaptive Sparse Matrix-Matrix Multiplication on the GPU"☆31Jul 7, 2020Updated 5 years ago
- Github mirror of trition-lang/triton repo.☆150Updated this week
- IREE's PyTorch Frontend, based on Torch Dynamo.☆105Mar 3, 2026Updated last week
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆329Updated this week
- Automated GPU Kernel Generation via Co-Evolving Intrinsic World Model☆52Mar 2, 2026Updated last week
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Jun 21, 2019Updated 6 years ago
- A Triton-only attention backend for vLLM☆24Feb 11, 2026Updated 3 weeks ago
- ☆33Jul 17, 2024Updated last year
- Accelerating MoE with IO and Tile-aware Optimizations☆597Feb 27, 2026Updated last week
- ☆18May 8, 2021Updated 4 years ago
- CUDA Template Functions☆20Dec 16, 2025Updated 2 months ago
- ☆16Sep 24, 2024Updated last year
- Noisy language compiler☆17Jul 31, 2024Updated last year
- HeteroHalide: From Image Processing DSL to Efficient FPGA Acceleration☆15Sep 14, 2020Updated 5 years ago
- Tile-Based Runtime for Ultra-Low-Latency LLM Inference☆675Feb 27, 2026Updated last week
- Collection of kernels written in Triton language☆181Jan 27, 2026Updated last month
- Distributed Compiler based on Triton for Parallel Systems☆1,380Feb 13, 2026Updated 3 weeks ago
- Wave: Python Domain-Specific Language for High Performance Machine Learning☆45Mar 3, 2026Updated last week
- ☆32Jul 2, 2025Updated 8 months ago
- A Triton JIT runtime and ffi provider in C++☆32Updated this week