NVIDIA / grace-cpu-benchmarking-guide
Guides and examples to help achieve optimal performance on a NVIDIA Grace CPU
☆11Updated last month
Related projects: ⓘ
- Python Interface to HIP and hiprtc Library☆9Updated 10 months ago
- Bandwidth test for ROCm☆45Updated this week
- A tracing JIT compiler for PyTorch☆12Updated 2 years ago
- [CF ’20] Verified Instruction-Level Energy Consumption Measurement for NVIDIA GPUs☆14Updated 3 years ago
- This is the open source version of HPL-MXP. The code performance has been verified on Frontier☆16Updated last year
- A Top-Down Profiler for GPU Applications☆13Updated 6 months ago
- An HPL-AI implementation for Fugaku☆19Updated 3 years ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆20Updated 3 months ago
- TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)☆27Updated this week
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆41Updated 3 weeks ago
- AMD’s C++ library for accelerating tensor primitives☆35Updated this week
- ☆22Updated this week
- HIP Python Low-level Bindings☆16Updated this week
- MLPerf™ logging library☆30Updated last week
- An I/O benchmark for deep Learning applications☆61Updated 2 weeks ago
- Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as ta…☆43Updated 2 years ago
- Intel® SHMEM - Device initiated shared memory based communication library☆15Updated 2 months ago
- Vendor-neutral library for exposing power and performance features across diverse architectures☆67Updated last month
- GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs☆14Updated 6 months ago
- SynapseAI Core is a reference implementation of the SynapseAI API running on Habana Gaudi☆36Updated last year
- A tracing JIT for PyTorch☆18Updated 2 years ago
- NVIDIA's launch, startup, and logging scripts used by our MLPerf Training and HPC submissions☆23Updated last month
- A unified framework across multiple programming platforms☆28Updated 3 months ago
- Record GPU memory accesses of a CUDA program and visualize the access pattern in a browser☆12Updated 3 years ago
- Random number library that generate pseudo-random and quasi-random numbers.☆23Updated this week
- Benchmarks to capture important workloads.☆28Updated 3 months ago
- GPU based Compressed Graph Traversal☆15Updated last year
- pytorch ucc plugin☆15Updated 3 years ago
- Information about AVX-512 support on recent Intel processors☆41Updated 2 years ago
- MPI accelerator-integrated communication extensions☆33Updated last year