NVIDIA / gpu_affinity
GPU Affinity is a package to automatically set the CPU process affinity to match the hardware architecture on a given platform
☆22Updated last year
Alternatives and similar repositories for gpu_affinity
Users that are interested in gpu_affinity are comparing it to the libraries listed below
Sorting:
- CloudAI Benchmark Framework☆64Updated this week
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- An extension library of WMMA API (Tensor Core API)☆96Updated 10 months ago
- Training material for Nsight developer tools☆157Updated 9 months ago
- oneCCL Bindings for Pytorch*☆97Updated 3 weeks ago
- Test data for DALI project☆42Updated 2 months ago
- GPU Stress Test is a tool to stress the compute engine of NVIDIA Tesla GPU’s by running a BLAS matrix multiply using different data types…☆91Updated last month
- Python bindings for NVTX☆66Updated last year
- RCCL Performance Benchmark Tests☆64Updated this week
- Stretching GPU performance for GEMMs and tensor contractions.☆237Updated last week
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆63Updated last month
- kmeans clustering with multi-GPU capabilities☆119Updated 2 years ago
- NVIDIA's launch, startup, and logging scripts used by our MLPerf Training and HPC submissions☆26Updated last week
- oneAPI Collective Communications Library (oneCCL)☆232Updated last week
- ☆32Updated last week
- A utility for stressing GPUs by driving utilization (and thus power consumption) up and down in user-defined cycle intervals. It will als…☆24Updated 2 years ago
- ☆69Updated last month
- Fast and memory-efficient exact attention☆68Updated last week
- C99/C++ header-only library for division via fixed-point multiplication by inverse☆51Updated last year
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆81Updated last year
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Updated 3 years ago
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆83Updated this week
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆108Updated 8 months ago
- AMD's graph optimization engine.☆217Updated this week
- This repository contains the results and code for the MLPerf™ Training v1.0 benchmark.☆38Updated last year
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 9 years ago
- NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.☆116Updated last year
- pytorch-profiler☆51Updated last year
- A Python script to convert the output of NVIDIA Nsight Systems (in SQLite format) to JSON in Google Chrome Trace Event Format.☆35Updated 3 months ago