ShadyBoukhary / GPU-research-FFT-OpenACC-CUDA
Case studies constitute a modern interdisciplinary and valuable teaching practice which plays a critical and fundamental role in the development of new skills and the formation of new knowledge. This research studies the behavior and performance of two interdisciplinary and widely adopted scientific kernels, a Fast Fourier Transform and Matrix M…
☆13Updated 6 years ago
Alternatives and similar repositories for GPU-research-FFT-OpenACC-CUDA:
Users that are interested in GPU-research-FFT-OpenACC-CUDA are comparing it to the libraries listed below
- Emulating DMA Engines on GPUs for Performance and Portability☆37Updated 9 years ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated 4 months ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆30Updated 2 months ago
- NUMA-aware multi-CPU multi-GPU data transfer benchmarks☆21Updated last year
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19Updated 9 months ago
- My notes on various HPC papers.☆21Updated 2 years ago
- ☆42Updated 4 years ago
- hardware test for CPU,GPU,I/O,memory bandwidth performance☆25Updated 6 years ago
- A Method for efficiently processing SpMV using SIMD and load balancing☆16Updated 2 years ago
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆36Updated 7 years ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆124Updated 4 years ago
- Prototype of OpenSHMEM for NVIDIA GPUs, developed as part of DoE Design Forward☆21Updated 6 years ago
- Fast Fourier Transform implementation, computable on CUDA platform. Seminar project for MI-PRC course at FIT CTU.☆37Updated last year
- OpenSHMEM Reference Implementation over UCX for Specification 1.4 and up☆34Updated last year
- GPU Performance Advisor☆64Updated 2 years ago
- TileFusion is a highly efficient kernel template library designed to elevate the level of abstraction in CUDA C for processing tiles.☆56Updated this week
- ☆17Updated 2 years ago
- fast Fourier transform on GPU in shared memory for AstroAccelerate project☆26Updated 4 years ago
- 🎃 GPU load-balancing library for regular and irregular computations.☆61Updated 8 months ago
- ☆17Updated 5 years ago
- An Attention Superoptimizer☆21Updated last month
- Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, all…☆31Updated last year
- Slides and exercises for persistent memory programming tutorial☆12Updated 2 years ago
- An extension library of WMMA API (Tensor Core API)☆89Updated 7 months ago
- CUDA 12.2 HMM demos☆19Updated 6 months ago
- A GPU FP32 computation method with Tensor Cores.☆20Updated 2 years ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆32Updated 4 years ago
- QCD for Intel Xeon Phi and Xeon processors☆14Updated 11 months ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆19Updated last week