NVlabs / cub
THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.
☆83Updated 11 months ago
Alternatives and similar repositories for cub:
Users that are interested in cub are comparing it to the libraries listed below
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- An extension library of WMMA API (Tensor Core API)☆88Updated 7 months ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆127Updated last year
- portDNN is a library implementing neural network algorithms written using SYCL☆110Updated 8 months ago
- Efficient Top-K implementation on the GPU☆151Updated 5 years ago
- Conversion to/from half-precision floating point formats☆341Updated 6 months ago
- Dissecting NVIDIA GPU Architecture☆88Updated 2 years ago
- Stretching GPU performance for GEMMs and tensor contractions.☆233Updated this week
- tophub autotvm log collections☆70Updated 2 years ago
- Assembler for NVIDIA Volta and Turing GPUs☆212Updated 3 years ago
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆67Updated 5 years ago
- ☆40Updated 4 years ago
- Training material for Nsight developer tools☆147Updated 6 months ago
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆105Updated this week
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆263Updated last month
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆36Updated 7 years ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆75Updated last year
- ☆38Updated 3 years ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆80Updated 5 years ago
- ☆60Updated 2 months ago
- cuDNN sample codes provided by Nvidia☆45Updated 6 years ago
- Some CUDA design patterns and a bit of template magic for CUDA☆148Updated last year
- ☆65Updated 11 years ago
- ☆87Updated 10 months ago
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆113Updated 2 years ago
- CUDA Tensor Transpose (cuTT) library☆51Updated 7 years ago
- CUDA Matrix Multiplication Optimization☆161Updated 7 months ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆79Updated last year
- Next generation SPARSE implementation for ROCm platform☆119Updated this week
- ☆93Updated 8 years ago