ShadyBoukhary / GPU-research-FFT-OpenACC-CUDA
Case studies constitute a modern interdisciplinary and valuable teaching practice which plays a critical and fundamental role in the development of new skills and the formation of new knowledge. This research studies the behavior and performance of two interdisciplinary and widely adopted scientific kernels, a Fast Fourier Transform and Matrix M…
☆10Updated 6 years ago
Related projects ⓘ
Alternatives and complementary repositories for GPU-research-FFT-OpenACC-CUDA
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆29Updated 2 months ago
- Emulating DMA Engines on GPUs for Performance and Portability☆34Updated 9 years ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆22Updated last month
- Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.☆11Updated last year
- My notes on various HPC papers.☆21Updated last year
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19Updated 6 months ago
- GPU Performance Advisor☆63Updated 2 years ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆17Updated 2 years ago
- StarPU Runtime system☆16Updated 14 years ago
- Dynamic matrix type and algorithms for sparse matrices☆17Updated last year
- ☆17Updated 2 years ago
- Multi-GPU communication profiler and visualizer☆18Updated 5 months ago
- Machine Learning System☆14Updated 4 years ago
- ☆15Updated 5 years ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆46Updated 2 months ago
- Prototype of OpenSHMEM for NVIDIA GPUs, developed as part of DoE Design Forward☆20Updated 6 years ago
- NUMA-aware multi-CPU multi-GPU data transfer benchmarks☆21Updated last year
- Official page for 18-847C (Spring '22): Data Center Computing☆16Updated 2 years ago
- [CF ’20] Verified Instruction-Level Energy Consumption Measurement for NVIDIA GPUs☆15Updated 3 years ago
- Finite Field Operations on GPGPU☆14Updated last year
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆43Updated 10 months ago
- Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019☆52Updated 2 years ago
- ☆11Updated 3 years ago
- An Attention Superoptimizer☆20Updated 6 months ago
- PTX-EMU is a simple emulator for CUDA program.☆24Updated 10 months ago
- A parallel framework for training deep neural networks☆45Updated 2 weeks ago
- A task benchmark☆40Updated 3 months ago
- Performance Prediction Toolkit☆51Updated 3 years ago
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆14Updated 5 years ago