ColfaxResearch / layout-categoriesLinks
This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".
☆83Updated 2 months ago
Alternatives and similar repositories for layout-categories
Users that are interested in layout-categories are comparing it to the libraries listed below
Sorting:
- Github mirror of trition-lang/triton repo.☆105Updated this week
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆133Updated this week
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆105Updated 5 months ago
- extensible collectives library in triton☆91Updated 8 months ago
- ☆52Updated 7 months ago
- ☆97Updated last year
- ☆110Updated last year
- ☆253Updated last year
- ☆151Updated 11 months ago
- CUTLASS and CuTe Examples☆112Updated 2 weeks ago
- ☆163Updated 7 months ago
- Building the Virtuous Cycle for AI-driven LLM Systems☆98Updated this week
- MLIR-based partitioning system☆151Updated this week
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …☆190Updated 10 months ago
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆415Updated last month
- A lightweight design for computation-communication overlap.☆196Updated 2 months ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆178Updated last week
- Autonomous GPU Kernel Generation via Deep Agents☆187Updated this week
- An experimental CPU backend for Triton☆167Updated last month
- Open ABI and FFI for Machine Learning Systems☆236Updated last week
- Low overhead tracing library and trace visualizer for pipelined CUDA kernels☆127Updated 3 weeks ago
- ☆69Updated 6 months ago
- Tutorials for NVIDIA CUPTI samples☆44Updated last month
- ☆39Updated this week
- NVIDIA cuTile learn☆119Updated last week
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆301Updated this week
- Artifacts of EVT ASPLOS'24☆28Updated last year
- Framework to reduce autotune overhead to zero for well known deployments.☆90Updated 3 months ago
- Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.☆78Updated last week
- Collection of kernels written in Triton language☆172Updated 8 months ago