NVIDIA / grace-cpu-benchmarking-guide
Guides and examples to help achieve optimal performance on a NVIDIA Grace CPU
☆12Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for grace-cpu-benchmarking-guide
- This is the open source version of HPL-MXP. The code performance has been verified on Frontier☆16Updated last year
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆44Updated last month
- Bandwidth test for ROCm☆49Updated this week
- [CF ’20] Verified Instruction-Level Energy Consumption Measurement for NVIDIA GPUs☆15Updated 3 years ago
- An HPL-AI implementation for Fugaku☆19Updated 3 years ago
- Computing the greatest common divisor with transformers, source code for the paper https//arxiv.org/abs/2308.15594☆12Updated 7 months ago
- A tracing JIT compiler for PyTorch☆12Updated 2 years ago
- Graph-indexed Pandas DataFrames for analyzing hierarchical performance data☆30Updated 3 weeks ago
- ☆36Updated last week
- A Top-Down Profiler for GPU Applications☆13Updated 8 months ago
- Simplified Interface to Complex Memory☆26Updated last year
- NVIDIA's launch, startup, and logging scripts used by our MLPerf Training and HPC submissions☆22Updated 3 weeks ago
- Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.☆11Updated last year
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆35Updated 6 months ago
- MPI accelerator-integrated communication extensions☆32Updated last year
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆22Updated last month
- ☆11Updated 3 years ago
- Intel® SHMEM - Device initiated shared memory based communication library☆21Updated 2 weeks ago
- cuASR: CUDA Algebra for Semirings☆34Updated 2 years ago
- A library for constructing allocators and memory pools. It also contains broadly useful abstractions and utilities for memory management.…☆40Updated this week
- Random number library that generate pseudo-random and quasi-random numbers.☆24Updated this week
- This is repository for a I/O benchmark which represents Scientific Deep Learning Workloads.☆23Updated last year
- A unified framework across multiple programming platforms☆33Updated 5 months ago
- CUDA 12.2 HMM demos☆17Updated 3 months ago
- Unit benchmarks of CUDA event APIs.☆17Updated 6 months ago
- pytorch ucc plugin☆17Updated 3 years ago
- The ultimate memory bandwidth benchmark☆46Updated last year
- Thallium is a C++14 library wrapping Margo, Mercury, and Argobots and providing an object-oriented way to use these libraries.☆12Updated 3 weeks ago
- AMD’s C++ library for accelerating tensor primitives☆35Updated this week
- ROCm BLAS marshalling library☆121Updated this week