rbga / CUDA-Merge-and-Bitonic-SortLinks
Efficient implementations of Merge Sort and Bitonic Sort algorithms using CUDA for GPU parallel processing, resulting in accelerated sorting of large arrays. Includes both CPU and GPU versions, along with a performance comparison.
☆18Updated 2 years ago
Alternatives and similar repositories for CUDA-Merge-and-Bitonic-Sort
Users that are interested in CUDA-Merge-and-Bitonic-Sort are comparing it to the libraries listed below
Sorting:
- CUDA Matrix Multiplication Optimization☆228Updated last year
- Examples from Programming in Parallel with CUDA☆161Updated 2 years ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆143Updated 5 years ago
- Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]☆74Updated 3 years ago
- An extension library of WMMA API (Tensor Core API)☆106Updated last year
- Flash Attention in raw Cuda C beating PyTorch☆31Updated last year
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆67Updated last year
- ☆153Updated 9 months ago
- Optimize GEMM with tensorcore step by step☆32Updated last year
- CUTLASS and CuTe Examples☆89Updated last week
- Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.☆152Updated 3 years ago
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.☆385Updated 9 months ago
- ☆17Updated last year
- A simple high performance CUDA GEMM implementation.☆409Updated last year
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆305Updated last month
- ☆115Updated last year
- ☆25Updated 2 months ago
- 📚 A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and software☆54Updated 7 months ago
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆74Updated 4 years ago
- Step-by-step optimization of CUDA SGEMM☆387Updated 3 years ago
- Dissecting NVIDIA GPU Architecture☆109Updated 3 years ago
- A tutorial for CUDA&PyTorch☆155Updated 8 months ago
- This project is about convolution operator optimization on GPU, include GEMM based (Implicit GEMM) convolution.☆39Updated 2 weeks ago
- Examples of CUDA implementations by Cutlass CuTe☆241Updated 3 months ago
- IMPACT GPU Algorithms Teaching Labs☆58Updated 2 years ago
- CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. …☆446Updated 2 years ago
- ☆107Updated 5 months ago
- ☆148Updated 5 months ago
- ☆14Updated 6 years ago
- ☆69Updated 9 months ago