cloudcores / CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully :)
☆483Updated 2 years ago
Alternatives and similar repositories for CuAssembler:
Users that are interested in CuAssembler are comparing it to the libraries listed below
- Assembler for NVIDIA Volta and Turing GPUs☆217Updated 3 years ago
- collection of benchmarks to measure basic GPU capabilities☆359Updated 2 months ago
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆393Updated 7 months ago
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.☆337Updated 3 months ago
- Shared Middle-Layer for Triton Compilation☆245Updated this week
- A simple high performance CUDA GEMM implementation.☆361Updated last year
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆383Updated this week
- Yinghan's Code Sample☆323Updated 2 years ago
- A Easy-to-understand TensorOp Matmul Tutorial☆342Updated 7 months ago
- Development repository for the Triton-Linalg conversion☆185Updated 2 months ago
- A model compilation solution for various hardware☆424Updated last week
- ☆410Updated this week
- CUDA Kernel Benchmarking Library☆620Updated this week
- CUDA Matrix Multiplication Optimization☆179Updated 9 months ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆130Updated last year
- An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).☆584Updated 2 weeks ago
- row-major matmul optimization☆624Updated last year
- Step-by-step optimization of CUDA SGEMM☆308Updated 3 years ago
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆88Updated last month
- ☆192Updated 2 years ago
- OpenAI Triton backend for Intel® GPUs☆179Updated this week
- A CPU tool for benchmarking the peak of floating points☆532Updated this week
- Dissecting NVIDIA GPU Architecture☆91Updated 2 years ago
- ☆198Updated 9 months ago
- This is the top-level repository for the Accel-Sim framework.☆395Updated this week
- ☆241Updated 2 months ago
- Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.☆78Updated 2 years ago
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆373Updated 2 weeks ago
- ☆95Updated last year
- A home for the final text of all TVM RFCs.☆102Updated 6 months ago