jhson989 / cuda-ptxLinks
Inline PTX Assembly in CUDA example
☆13Updated 3 years ago
Alternatives and similar repositories for cuda-ptx
Users that are interested in cuda-ptx are comparing it to the libraries listed below
Sorting:
- Learn OpenCL step by step.☆136Updated 3 years ago
- 📚 A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and software☆58Updated 9 months ago
- ☆15Updated 3 weeks ago
- GPUOcelot: A dynamic compilation framework for PTX☆217Updated 9 months ago
- An extension library of WMMA API (Tensor Core API)☆109Updated last year
- CUDA Matrix Multiplication Optimization☆241Updated last year
- Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.☆13Updated 2 years ago
- Training material for Nsight developer tools☆173Updated last year
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆135Updated 2 years ago
- Collection of easy, well-documented and useful OpenCL examples in C++.☆83Updated 3 years ago
- cuDNN sample codes provided by Nvidia☆46Updated 6 years ago
- ☆46Updated 5 months ago
- Assembler for NVIDIA Volta and Turing GPUs☆234Updated 3 years ago
- ☆71Updated 11 years ago
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆173Updated 4 months ago
- Tenstorrent MLIR compiler☆215Updated this week
- A tool for examining GPU scheduling behavior.☆89Updated last year
- collection of benchmarks to measure basic GPU capabilities☆461Updated last month
- The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their per…☆52Updated this week
- A profiler to disclose and quantify hardware features on GPUs.☆175Updated 3 years ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆122Updated 3 weeks ago
- ☆159Updated this week
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆109Updated 8 years ago
- ☆109Updated last year
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆94Updated 2 years ago
- amdgpu example code in hip/asm☆46Updated this week
- Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!☆112Updated 4 months ago
- MLIR-based toolkit targeting intel heterogeneous hardware☆49Updated 9 months ago
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆145Updated last week
- portDNN is a library implementing neural network algorithms written using SYCL☆113Updated last year