0xD0GF00D / DocumentSASSLinks
Unofficial description of the CUDA assembly (SASS) instruction sets.
☆188Updated 5 months ago
Alternatives and similar repositories for DocumentSASS
Users that are interested in DocumentSASS are comparing it to the libraries listed below
Sorting:
- Dissecting NVIDIA GPU Architecture☆115Updated 3 years ago
- ☆110Updated last year
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆143Updated this week
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆124Updated last month
- ☆161Updated this week
- This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".☆83Updated 3 months ago
- ☆46Updated 6 months ago
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆146Updated last week
- An extension library of WMMA API (Tensor Core API)☆109Updated last year
- ☆53Updated 7 months ago
- Assembler for NVIDIA Volta and Turing GPUs☆236Updated 3 years ago
- amdgpu example code in hip/asm☆46Updated 2 weeks ago
- An experimental CPU backend for Triton☆167Updated last month
- ☆293Updated 3 months ago
- ☆47Updated 5 years ago
- A framework that support executing unmodified CUDA source code on non-NVIDIA devices.☆138Updated 11 months ago
- A repository where GPU applications are aggregated using a common build flow that supports multiple CUDA versions.☆85Updated 2 months ago
- GPU Performance Advisor☆65Updated 3 years ago
- ☆54Updated 6 years ago
- GPUOcelot: A dynamic compilation framework for PTX☆219Updated 10 months ago
- Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.☆95Updated 2 years ago
- ☆74Updated 7 months ago
- Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.☆80Updated last week
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆104Updated 6 months ago
- CUTLASS and CuTe Examples☆114Updated last month
- MLIR Sample dialect☆134Updated last week
- Open ABI and FFI for Machine Learning Systems☆262Updated this week
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆564Updated 2 years ago
- Tutorials for NVIDIA CUPTI samples☆47Updated last month
- TPP experimentation on MLIR for linear algebra☆141Updated 2 weeks ago