rbga / CUDA-Merge-and-Bitonic-SortLinks
Efficient implementations of Merge Sort and Bitonic Sort algorithms using CUDA for GPU parallel processing, resulting in accelerated sorting of large arrays. Includes both CPU and GPU versions, along with a performance comparison.
☆19Updated 2 years ago
Alternatives and similar repositories for CUDA-Merge-and-Bitonic-Sort
Users that are interested in CUDA-Merge-and-Bitonic-Sort are comparing it to the libraries listed below
Sorting:
- Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]☆74Updated 3 years ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆145Updated 5 years ago
- CUDA Matrix Multiplication Optimization☆239Updated last year
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆70Updated last year
- CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. …☆459Updated 2 years ago
- CUTLASS and CuTe Examples☆104Updated this week
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆75Updated 4 years ago
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.☆394Updated 10 months ago
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆332Updated 2 weeks ago
- Training material for Nsight developer tools☆173Updated last year
- Examples from Programming in Parallel with CUDA☆164Updated 2 years ago
- My notes on various HPC papers.☆24Updated 2 years ago
- Optimize GEMM with tensorcore step by step☆32Updated last year
- Machine Learning Compiler Road Map☆45Updated 2 years ago
- ☆17Updated last year
- An extension library of WMMA API (Tensor Core API)☆109Updated last year
- DGEMM on KNL, achieve 75% MKL☆19Updated 3 years ago
- ☆156Updated 11 months ago
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …☆190Updated 10 months ago
- IMPACT GPU Algorithms Teaching Labs☆58Updated 2 years ago
- ☆109Updated last year
- Xiao's CUDA Optimization Guide [NO LONGER ADDING NEW CONTENT]☆318Updated 3 years ago
- ☆154Updated 6 months ago
- 大规模并行处理器编程实战 第二版答案☆33Updated 3 years ago
- Step-by-step optimization of CUDA SGEMM☆402Updated 3 years ago
- A simple high performance CUDA GEMM implementation.☆418Updated last year
- Solution of Programming Massively Parallel Processors☆50Updated last year
- ☆71Updated 11 years ago
- 📚 A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and software☆58Updated 9 months ago
- ☆112Updated 6 months ago