mohamed / roofline
A simple script to plot the Roofline model for given HW platforms and applications
☆9Updated 8 months ago
Alternatives and similar repositories for roofline:
Users that are interested in roofline are comparing it to the libraries listed below
- ☆25Updated 5 years ago
- ☆17Updated 3 years ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆32Updated 4 years ago
- Emulating DMA Engines on GPUs for Performance and Portability☆39Updated 9 years ago
- ☆14Updated 5 years ago
- ☆33Updated 3 years ago
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Updated 5 years ago
- Horizontal Fusion☆23Updated 3 years ago
- GPU Performance Advisor☆64Updated 2 years ago
- ☆11Updated 4 years ago
- ☆43Updated 4 years ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆25Updated 2 months ago
- DietCode Code Release☆63Updated 2 years ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated 6 months ago
- ☆25Updated 4 years ago
- CUDAAdvisor: a GPU profiling tool☆49Updated 6 years ago
- A Benchmark Suite for Heterogeneous System Computation☆53Updated 2 months ago
- Public Release of Stream-Dataflow☆14Updated 5 years ago
- ☆13Updated 3 years ago
- GVProf: A Value Profiler for GPU-based Clusters☆49Updated last year
- Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators☆108Updated 2 years ago
- A novel spatial accelerator for horizontal diffusion weather stencil computation, as described in ICS 2023 paper by Singh et al. (https:/…☆18Updated last year
- ☆23Updated 2 years ago
- Performance Prediction Toolkit☆51Updated 4 months ago
- Benchmark suite containing cache filtered traces for use with Ramulator. These include some of the workloads used in our SIGMETRICS 2019 …☆22Updated 4 years ago
- ☆30Updated 2 years ago
- ☆14Updated 3 years ago
- TVM stack: exploring the incredible explosion of deep-learning frameworks and how to bring them together☆64Updated 6 years ago
- Mille Crepe Bench: layer-wise performance analysis for deep learning frameworks.☆17Updated 5 years ago
- ngAP's artifact for ASPLOS'24☆23Updated 3 months ago