hipdac-lab / ICS23-GPULZ
GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs
☆14Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for ICS23-GPULZ
- A GPU accelerated error-bounded lossy compression for scientific data.☆65Updated this week
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆80Updated 7 months ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆22Updated last month
- A Top-Down Profiler for GPU Applications☆13Updated 8 months ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆43Updated 10 months ago
- RCCL Performance Benchmark Tests☆50Updated 3 weeks ago
- TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)☆36Updated this week
- Reference implementation of Deep Neural Network primitives using LIBXSMM's Tensor Processing Primitives (TPP)☆12Updated 3 months ago
- ☆41Updated 4 years ago
- ☆14Updated 6 months ago
- An HPL-AI implementation for Fugaku☆19Updated 3 years ago
- HeteroSync is a benchmark suite for performing fine-grained synchronization on tightly coupled GPUs☆27Updated 2 months ago
- GPU Performance Advisor☆63Updated 2 years ago
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆75Updated last week
- GPUDirect Async support for IB Verbs☆90Updated 2 years ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆29Updated 2 months ago
- DeepSZ: A Novel Framework to Compress Deep Neural Networks by Using Error-Bounded Lossy Compression☆11Updated 4 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆99Updated 7 years ago
- Linux Cross-Memory Attach☆88Updated 2 months ago
- Prototype of OpenSHMEM for NVIDIA GPUs, developed as part of DoE Design Forward☆20Updated 6 years ago
- InstLatX64_Demo☆41Updated last week
- A task benchmark☆40Updated 3 months ago
- Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as ta…☆44Updated 3 years ago
- Pytorch process group third-party plugin for UCC☆20Updated 7 months ago
- tools to create performance and roofline plots from measured data☆58Updated 10 years ago
- Bandwidth test for ROCm☆49Updated this week
- ☆47Updated 5 years ago
- CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels☆31Updated 3 years ago
- A GPU FP32 computation method with Tensor Cores.☆18Updated 2 years ago
- ☆17Updated 10 months ago