rbga / CUDA-Merge-and-Bitonic-SortLinks
Efficient implementations of Merge Sort and Bitonic Sort algorithms using CUDA for GPU parallel processing, resulting in accelerated sorting of large arrays. Includes both CPU and GPU versions, along with a performance comparison.
☆20Updated 2 years ago
Alternatives and similar repositories for CUDA-Merge-and-Bitonic-Sort
Users that are interested in CUDA-Merge-and-Bitonic-Sort are comparing it to the libraries listed below
Sorting:
- CUDA Matrix Multiplication Optimization☆245Updated last year
- Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]☆75Updated 3 years ago
- ☆17Updated last year
- Examples from Programming in Parallel with CUDA☆167Updated 2 years ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆146Updated 5 years ago
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆76Updated 4 years ago
- An extension library of WMMA API (Tensor Core API)☆109Updated last year
- Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.☆158Updated 3 years ago
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆70Updated last year
- CUDA PTX-ISA Document 中文翻译版☆47Updated 2 months ago
- CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. …☆462Updated 2 years ago
- This project is about convolution operator optimization on GPU, include GEMM based (Implicit GEMM) convolution.☆41Updated 2 months ago
- ☆156Updated 11 months ago
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.☆397Updated 11 months ago
- Training material for Nsight developer tools☆173Updated last year
- CUTLASS and CuTe Examples☆112Updated 2 weeks ago
- A simple high performance CUDA GEMM implementation.☆421Updated last year
- Optimize GEMM with tensorcore step by step☆36Updated 2 years ago
- ☆22Updated 6 months ago
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …☆190Updated 10 months ago
- IMPACT GPU Algorithms Teaching Labs☆58Updated 2 years ago
- Implementation and analysis of five different GPU based SPMV algorithms in CUDA☆40Updated 6 years ago
- Solution of Programming Massively Parallel Processors☆50Updated last year
- Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.☆13Updated 2 years ago
- Step-by-step optimization of CUDA SGEMM☆414Updated 3 years ago
- ☆110Updated last year
- ☆274Updated last month
- ☆163Updated 7 months ago
- ☆71Updated 11 years ago
- Dissecting NVIDIA GPU Architecture☆115Updated 3 years ago