cloudcores / CuAssemblerLinks
An unofficial cuda assembler, for all generations of SASS, hopefully :)
☆529Updated 2 years ago
Alternatives and similar repositories for CuAssembler
Users that are interested in CuAssembler are comparing it to the libraries listed below
Sorting:
- Assembler for NVIDIA Volta and Turing GPUs☆229Updated 3 years ago
- collection of benchmarks to measure basic GPU capabilities☆408Updated 6 months ago
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆449Updated this week
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.☆373Updated 7 months ago
- Shared Middle-Layer for Triton Compilation☆273Updated this week
- Yinghan's Code Sample☆344Updated 3 years ago
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆460Updated 11 months ago
- A model compilation solution for various hardware☆445Updated last week
- Development repository for the Triton-Linalg conversion☆193Updated 6 months ago
- A Easy-to-understand TensorOp Matmul Tutorial☆374Updated 11 months ago
- A simple high performance CUDA GEMM implementation.☆396Updated last year
- An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).☆617Updated last week
- Step-by-step optimization of CUDA SGEMM☆367Updated 3 years ago
- CUDA Matrix Multiplication Optimization☆217Updated last year
- CUDA Kernel Benchmarking Library☆701Updated last week
- row-major matmul optimization☆658Updated last week
- Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.☆152Updated 3 years ago
- ☆271Updated 2 months ago
- ☆106Updated last year
- OpenAI Triton backend for Intel® GPUs☆204Updated this week
- ☆150Updated 8 months ago
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆136Updated last month
- Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.☆84Updated 2 years ago
- BLISlab: A Sandbox for Optimizing GEMM☆534Updated 4 years ago
- ☆147Updated this week
- 14 basic topics for VEGA64 performance optmization☆61Updated 4 years ago
- Dissecting NVIDIA GPU Architecture☆104Updated 3 years ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆112Updated 3 months ago
- A CPU tool for benchmarking the peak of floating points☆559Updated last month
- ☆196Updated 2 years ago