NVIDIA / gpu_affinity
GPU Affinity is a package to automatically set the CPU process affinity to match the hardware architecture on a given platform
☆19Updated last year
Alternatives and similar repositories for gpu_affinity:
Users that are interested in gpu_affinity are comparing it to the libraries listed below
- ☆61Updated 3 months ago
- An extension library of WMMA API (Tensor Core API)☆93Updated 8 months ago
- oneCCL Bindings for Pytorch*☆91Updated this week
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- kmeans clustering with multi-GPU capabilities☆120Updated last year
- RCCL Performance Benchmark Tests☆60Updated 3 weeks ago
- AMD ROCm Performance Primitives (RPP) library is a comprehensive high-performance computer vision library for AMD processors with HIP/Ope…☆59Updated last week
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆79Updated last week
- oneAPI Collective Communications Library (oneCCL)☆227Updated this week
- study of cutlass☆21Updated 4 months ago
- Benchmark code for the "Online normalizer calculation for softmax" paper☆87Updated 6 years ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Updated 3 years ago
- ☆49Updated last year
- AMD's graph optimization engine.☆213Updated this week
- portDNN is a library implementing neural network algorithms written using SYCL☆111Updated 10 months ago
- CVFusion is an open-source deep learning compiler to fuse the OpenCV operators.☆29Updated 2 years ago
- Training material for Nsight developer tools☆152Updated 7 months ago
- ☆26Updated this week
- This is a demo how to write a high performance convolution run on apple silicon☆54Updated 3 years ago
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆106Updated last week
- Standalone Flash Attention v2 kernel without libtorch dependency☆108Updated 6 months ago
- CUPTI GPU Profiler☆37Updated 6 years ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆57Updated last week
- ☆60Updated 3 months ago
- NVIDIA's launch, startup, and logging scripts used by our MLPerf Training and HPC submissions☆25Updated this week
- Example of using pytorch's open device registration API☆28Updated 2 years ago
- A tool for bandwidth measurements on NVIDIA GPUs.☆393Updated last month
- Python bindings for NVTX☆66Updated last year
- ☆66Updated 11 years ago
- GPU Stress Test is a tool to stress the compute engine of NVIDIA Tesla GPU’s by running a BLAS matrix multiply using different data types…☆87Updated 5 months ago