incubator repo for CUDA-TileIR backend
☆124Mar 18, 2026Updated last week
Alternatives and similar repositories for Triton-to-tile-IR
Users that are interested in Triton-to-tile-IR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-base…☆881Mar 24, 2026Updated last week
- a size profiler for cuda binary☆71Jan 15, 2026Updated 2 months ago
- Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.☆84Updated this week
- Helpful kernel tutorials and examples for tile-based GPU programming☆683Updated this week
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆197Updated this week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Github mirror of trition-lang/triton repo.☆152Updated this week
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆462Updated this week
- A Triton-only attention backend for vLLM☆24Mar 17, 2026Updated last week
- Shared Middle-Layer for Triton Compilation☆330Dec 5, 2025Updated 3 months ago
- cuTile is a programming model for writing parallel kernels for NVIDIA GPUs☆1,990Mar 21, 2026Updated last week
- Code snippets and reproductions from JustAByte☆28Jan 25, 2026Updated 2 months ago
- SBLP 2025 MLIR Tutorial☆73Updated this week
- DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang☆44Nov 19, 2025Updated 4 months ago
- triton for dsa☆60Updated this week
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆492Updated this week
- A Triton JIT runtime and ffi provider in C++☆32Mar 24, 2026Updated last week
- Accelerating MoE with IO and Tile-aware Optimizations☆614Updated this week
- Collection of kernels written in Triton language☆185Jan 27, 2026Updated 2 months ago
- ☆18Mar 4, 2025Updated last year
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆335Updated this week
- ☆33Jul 17, 2024Updated last year
- ☆109Mar 12, 2026Updated 2 weeks ago
- An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.☆51Jul 23, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆52May 19, 2025Updated 10 months ago
- Distributed Compiler based on Triton for Parallel Systems☆1,398Mar 11, 2026Updated 2 weeks ago
- A lightweight triton-based General Matrix Multiplication (GEMM) library.☆56Updated this week
- Learning TileLang with 10 puzzles!☆160Updated this week
- Wave: Python Domain-Specific Language for High Performance Machine Learning☆49Updated this week
- Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration☆39Jan 8, 2026Updated 2 months ago
- High-performance distributed data shuffling (all-to-all) library for MoE training and inference☆117Mar 7, 2026Updated 3 weeks ago
- ☆310Mar 22, 2026Updated last week
- diffusers with search engine☆12Jan 13, 2026Updated 2 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Tile-Based Runtime for Ultra-Low-Latency LLM Inference☆690Mar 8, 2026Updated 3 weeks ago
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Jun 21, 2019Updated 6 years ago
- HunyuanDiT with TensorRT and libtorch☆18May 22, 2024Updated last year
- ☆36Mar 7, 2025Updated last year
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆810Updated this week
- TensaLang is a Tensor-first programming language, compiler, and runtime that let you write the Model’s inference engine (e.g. LLMs) and s…☆74Feb 20, 2026Updated last month
- ☆68Jan 18, 2026Updated 2 months ago