mohamed / roofline
A simple script to plot the Roofline model for given HW platforms and applications
☆9Updated 3 weeks ago
Related projects: ⓘ
- ☆17Updated 2 years ago
- ☆22Updated 4 years ago
- Thinking is hard - automate it☆18Updated 2 years ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆31Updated 4 years ago
- GVProf: A Value Profiler for GPU-based Clusters☆46Updated 5 months ago
- Mille Crepe Bench: layer-wise performance analysis for deep learning frameworks.☆17Updated 4 years ago
- ☆17Updated last year
- ☆32Updated 2 years ago
- Benchmark for matrix multiplications between dense and block sparse (BSR) matrix in TVM, blocksparse (Gray et al.) and cuSparse.☆24Updated 4 years ago
- ☆24Updated 4 years ago
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆18Updated last year
- ☆15Updated 2 years ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆17Updated 2 years ago
- CUDAAdvisor: a GPU profiling tool☆48Updated 6 years ago
- ☆12Updated 4 years ago
- MLSys 2021 paper: MicroRec: efficient recommendation inference by hardware and data structure solutions☆13Updated 3 years ago
- Emulating DMA Engines on GPUs for Performance and Portability☆32Updated 9 years ago
- ☆14Updated 2 years ago
- ☆39Updated 3 years ago
- Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite☆57Updated 6 years ago
- ☆20Updated 2 years ago
- Fibertree emulator☆11Updated last month
- ☆14Updated 5 months ago
- ☆38Updated 4 years ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆20Updated 3 months ago
- GPGPU-Sim provides a detailed simulation model of a contemporary GPU running CUDA and/or OpenCL workloads and now includes an integrated…☆30Updated last month
- Source code of the simulator used in the Mosaic paper from MICRO 2017: "Mosaic: A GPU Memory Manager with Application-Transparent Support…☆40Updated 6 years ago
- HeteroSync is a benchmark suite for performing fine-grained synchronization on tightly coupled GPUs☆27Updated last year
- GPU Performance Advisor☆58Updated 2 years ago