cloudcores / CuAssemblerLinks
An unofficial cuda assembler, for all generations of SASS, hopefully :)
☆539Updated 2 years ago
Alternatives and similar repositories for CuAssembler
Users that are interested in CuAssembler are comparing it to the libraries listed below
Sorting:
- Assembler for NVIDIA Volta and Turing GPUs☆230Updated 3 years ago
- collection of benchmarks to measure basic GPU capabilities☆422Updated 7 months ago
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆471Updated this week
- Shared Middle-Layer for Triton Compilation☆288Updated last week
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.☆384Updated 9 months ago
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆480Updated last year
- Yinghan's Code Sample☆351Updated 3 years ago
- CUDA Kernel Benchmarking Library☆733Updated this week
- A model compilation solution for various hardware☆450Updated last month
- An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).☆636Updated last week
- Development repository for the Triton-Linalg conversion☆202Updated 8 months ago
- CUDA Matrix Multiplication Optimization☆223Updated last year
- ☆421Updated this week
- A Easy-to-understand TensorOp Matmul Tutorial☆378Updated last year
- ☆282Updated 2 weeks ago
- A simple high performance CUDA GEMM implementation.☆409Updated last year
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆144Updated 2 months ago
- Dissecting NVIDIA GPU Architecture☆105Updated 3 years ago
- Step-by-step optimization of CUDA SGEMM☆386Updated 3 years ago
- This is the top-level repository for the Accel-Sim framework.☆481Updated last week
- MLIR Sample dialect☆129Updated 7 months ago
- ☆108Updated last year
- ☆153Updated 9 months ago
- Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.☆151Updated 3 years ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆117Updated 4 months ago
- OpenAI Triton backend for Intel® GPUs☆210Updated last week
- row-major matmul optimization☆674Updated last month
- Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.☆85Updated 2 years ago
- ☆144Updated 5 months ago
- Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure☆921Updated this week