upsj / gpu_selection
Parallel selection on GPUs
☆14Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for gpu_selection
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- ☆14Updated 2 years ago
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆35Updated 7 years ago
- A GPU accelerated error-bounded lossy compression for scientific data.☆64Updated this week
- A library to benchmark CUDA code, similar to google benchmark.☆28Updated 3 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆99Updated 7 years ago
- Kernel Tuning Toolkit☆55Updated last week
- A Library for fast Hash Tables on GPUs☆109Updated 2 years ago
- ☆57Updated this week
- ☆30Updated this week
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆83Updated 8 months ago
- A warp-oriented dynamic hash table for GPUs☆71Updated 9 months ago
- ☆20Updated 5 years ago
- High-performance, GPU-aware communication library☆84Updated 2 weeks ago
- An extension library of WMMA API (Tensor Core API)☆82Updated 3 months ago
- An implementation of BLAS using the SYCL open standard.☆259Updated last week
- CUDA kernel author's tools☆107Updated 2 years ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆65Updated last year
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆91Updated 2 years ago
- Prototype of OpenSHMEM for NVIDIA GPUs, developed as part of DoE Design Forward☆20Updated 6 years ago
- Online CUDA Occupancy Calculator☆66Updated 3 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆65Updated last year
- Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, all…☆25Updated last year
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆114Updated 4 years ago
- Instructions, Docker images, and examples for Nsight Compute and Nsight Systems☆127Updated 4 years ago
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆99Updated this week
- Full-speed Array of Structures access