ShadyBoukhary / GPU-research-FFT-OpenACC-CUDALinks
Case studies constitute a modern interdisciplinary and valuable teaching practice which plays a critical and fundamental role in the development of new skills and the formation of new knowledge. This research studies the behavior and performance of two interdisciplinary and widely adopted scientific kernels, a Fast Fourier Transform and Matrix M…
☆14Updated 7 years ago
Alternatives and similar repositories for GPU-research-FFT-OpenACC-CUDA
Users that are interested in GPU-research-FFT-OpenACC-CUDA are comparing it to the libraries listed below
Sorting:
- An MLIR-based AI compiler designed for Python frontend to RISC-V DSA☆10Updated 10 months ago
- My notes on various HPC papers.☆22Updated 2 years ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Updated 4 months ago
- ☆18Updated 3 years ago
- IMPACT GPU Algorithms Teaching Labs☆58Updated 2 years ago
- ☆16Updated 2 years ago
- Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019☆57Updated 3 years ago
- Slides and exercises for persistent memory programming tutorial☆14Updated 2 years ago
- GPU Performance Advisor☆66Updated 3 years ago
- NUMA-aware multi-CPU multi-GPU data transfer benchmarks☆24Updated last year
- A GPU FP32 computation method with Tensor Cores.☆21Updated 2 years ago
- ☆14Updated 6 years ago
- 方便扩展的Cuda算子理解和优化框架,仅用在学习使用☆15Updated last year
- Multi-GPU communication profiler and visualizer☆31Updated last year
- Source code for the FAST '23 paper “MadFS: Per-File Virtualization for Userspace Persistent Memory Filesystems”☆43Updated 2 years ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆31Updated 6 months ago
- A GPU-accelerated DNN inference serving system that supports instant kernel preemption and biased concurrent execution in GPU scheduling.☆42Updated 3 years ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19Updated last year
- A fast and accurate reuse distance analyzer for multi-threaded applications. It leverages existing hardware features in commodity CPUs.☆19Updated 2 years ago
- gossip: Efficient Communication Primitives for Multi-GPU Systems☆59Updated 3 years ago
- ☆27Updated 6 months ago
- Official page for 18-847C (Spring '22): Data Center Computing☆16Updated 3 years ago
- Performance Prediction Toolkit☆52Updated 8 months ago
- ☆12Updated 4 months ago
- Machine Learning System☆14Updated 5 years ago
- An efficient concurrent graph processing system☆46Updated 3 years ago
- High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph☆55Updated 3 years ago
- [CF ’20] Verified Instruction-Level Energy Consumption Measurement for NVIDIA GPUs☆15Updated 4 years ago
- Lightning In-Memory Object Store☆47Updated 3 years ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated 10 months ago