NervanaSystems / maxasLinks
Assembler for NVIDIA Maxwell architecture
☆1,002Updated 2 years ago
Alternatives and similar repositories for maxas
Users that are interested in maxas are comparing it to the libraries listed below
Sorting:
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆502Updated 2 years ago
- Patterns and behaviors for GPU computing☆1,717Updated 2 years ago
- BLISlab: A Sandbox for Optimizing GEMM☆525Updated 3 years ago
- Source code examples from the Parallel Forall Blog☆1,287Updated 10 months ago
- [ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl☆1,747Updated last year
- Library for specialized dense and sparse matrix operations, and deep learning primitives.☆875Updated this week
- CUDA Kernel Benchmarking Library☆650Updated this week
- A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).☆536Updated this week
- Assembler for NVIDIA Volta and Turing GPUs☆218Updated 3 years ago
- CUDA Data Parallel Primitives Library☆431Updated 6 years ago
- A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology☆1,112Updated this week
- ☆1,878Updated last year
- a software library containing BLAS functions written in OpenCL☆855Updated 9 months ago
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆715Updated 3 months ago
- A code generator for array-based code on CPUs and GPUs☆604Updated last week
- ☆538Updated this week
- Low-precision matrix multiplication☆1,803Updated last year
- HIPIFY: Convert CUDA to Portable C++ Code☆580Updated this week
- Source code that accompanies The CUDA Handbook.☆525Updated 3 months ago
- Open single and half precision gemm implementations☆381Updated 2 years ago
- Demonstration of various hardware effects on CUDA GPUs.☆378Updated last year
- A CPU tool for benchmarking the peak of floating points☆544Updated 3 weeks ago
- GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for…☆1,339Updated 3 months ago
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.☆353Updated 4 months ago
- Winograd minimal convolution algorithm generator for convolutional neural networks.☆618Updated 4 years ago
- collection of benchmarks to measure basic GPU capabilities☆376Updated 3 months ago
- row-major matmul optimization☆634Updated last year
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆415Updated 8 months ago
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆401Updated 4 months ago
- Stretching GPU performance for GEMMs and tensor contractions.☆241Updated this week