NVIDIA / cuda-tileLinks
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA tensor core units.
☆763Updated 3 weeks ago
Alternatives and similar repositories for cuda-tile
Users that are interested in cuda-tile are comparing it to the libraries listed below
Sorting:
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆437Updated 3 weeks ago
- Fast and Furious AMD Kernels☆331Updated 2 weeks ago
- Helpful kernel tutorials and examples for tile-based GPU programming☆554Updated this week
- AI Tensor Engine for ROCm☆330Updated last week
- CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning☆294Updated this week
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆182Updated this week
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆433Updated last week
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆148Updated this week
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆182Updated 2 weeks ago
- An experimental CPU backend for Triton☆168Updated 2 months ago
- Nvidia Instruction Set Specification Generator☆309Updated last year
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆706Updated this week
- Fastest kernels written from scratch☆517Updated 3 months ago
- kernels, of the mega variety☆640Updated 3 months ago
- High-Performance SGEMM on CUDA devices☆115Updated 11 months ago
- MLIR-based partitioning system☆157Updated this week
- Open ABI and FFI for Machine Learning Systems☆293Updated this week
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆193Updated 5 months ago
- ☆127Updated 2 months ago
- torchcomms: a modern PyTorch communications API☆319Updated this week
- GPUOcelot: A dynamic compilation framework for PTX☆219Updated 11 months ago
- Shared Middle-Layer for Triton Compilation☆321Updated last month
- Backward compatible ML compute opset inspired by HLO/MHLO☆589Updated 3 weeks ago
- Perplexity open source garden for inference technology☆324Updated 2 weeks ago
- Hand-Rolled GPU communications library☆76Updated last month
- A Quirky Assortment of CuTe Kernels☆741Updated this week
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆162Updated last week
- GPU documentation for humans☆478Updated last month
- OpenAI Triton backend for Intel® GPUs☆223Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆368Updated this week