ShadyBoukhary / GPU-research-FFT-OpenACC-CUDA
Case studies constitute a modern interdisciplinary and valuable teaching practice which plays a critical and fundamental role in the development of new skills and the formation of new knowledge. This research studies the behavior and performance of two interdisciplinary and widely adopted scientific kernels, a Fast Fourier Transform and Matrix M…
☆13Updated 6 years ago
Alternatives and similar repositories for GPU-research-FFT-OpenACC-CUDA:
Users that are interested in GPU-research-FFT-OpenACC-CUDA are comparing it to the libraries listed below
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19Updated 10 months ago
- My notes on various HPC papers.☆22Updated 2 years ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆74Updated this week
- fast Fourier transform on GPU in shared memory for AstroAccelerate project☆26Updated 4 years ago
- 方便扩展的Cuda算子理解和优化框架,仅用在学习使用☆13Updated 9 months ago
- ☆15Updated 5 years ago
- NUMA-aware multi-CPU multi-GPU data transfer benchmarks☆23Updated last year
- Fast Fourier Transform implementation, computable on CUDA platform. Seminar project for MI-PRC course at FIT CTU.☆37Updated last year
- Emulating DMA Engines on GPUs for Performance and Portability☆38Updated 9 years ago
- Multi-GPU communication profiler and visualizer☆28Updated 9 months ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆31Updated 3 months ago
- An extension library of WMMA API (Tensor Core API)☆93Updated 8 months ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated 5 months ago
- CUDA 12.2 HMM demos☆19Updated 8 months ago
- ☆39Updated 5 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated 2 weeks ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆127Updated 4 years ago
- Triton to TVM transpiler.☆19Updated 5 months ago
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …☆180Updated 2 months ago
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆59Updated 6 months ago
- ☆43Updated 4 years ago
- Benchmarking OpenBLAS on the Apple M1☆18Updated 4 years ago
- A practical way of learning Swizzle☆16Updated 2 months ago
- ☆17Updated 5 years ago
- ☆21Updated last month
- ☆12Updated 4 years ago
- ☆11Updated 3 years ago
- An MLIR-based toy DL compiler for TVM Relay.☆58Updated 2 years ago
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆106Updated this week
- Machine Learning System☆14Updated 4 years ago