NVIDIA / gpu_affinityLinks
GPU Affinity is a package to automatically set the CPU process affinity to match the hardware architecture on a given platform
☆28Updated 2 years ago
Alternatives and similar repositories for gpu_affinity
Users that are interested in gpu_affinity are comparing it to the libraries listed below
Sorting:
- oneCCL Bindings for Pytorch* (deprecated)☆104Updated last month
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆85Updated last year
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆91Updated 2 years ago
- Training material for Nsight developer tools☆173Updated last year
- A GPU-driven system framework for scalable AI applications☆123Updated 10 months ago
- Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport☆72Updated 7 months ago
- A TUI-based utility for real-time monitoring of InfiniBand traffic and performance metrics on the local node☆60Updated last week
- A tool for examining GPU scheduling behavior.☆89Updated last year
- Magnum IO community repo☆105Updated 3 weeks ago
- Emulating DMA Engines on GPUs for Performance and Portability☆41Updated 10 years ago
- oneAPI Collective Communications Library (oneCCL)☆252Updated last week
- An extension library of WMMA API (Tensor Core API)☆109Updated last year
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆109Updated 8 years ago
- GPUDirect Async support for IB Verbs☆133Updated 3 years ago
- Experiments evaluating preemption on the NVIDIA Pascal architecture☆17Updated 9 years ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆104Updated 5 months ago
- Assembler for NVIDIA Volta and Turing GPUs☆235Updated 3 years ago
- CUPTI GPU Profiler☆40Updated 6 years ago
- NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.☆122Updated 2 years ago
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆138Updated 7 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆256Updated last week
- A tool for bandwidth measurements on NVIDIA GPUs.☆588Updated 8 months ago
- Python bindings for NVTX☆67Updated 2 years ago
- NVIDIA's launch, startup, and logging scripts used by our MLPerf Training and HPC submissions☆35Updated 3 months ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Updated 3 years ago
- CloudAI Benchmark Framework☆76Updated this week
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …☆191Updated 10 months ago
- ☆71Updated 9 months ago
- Benchmark code for the "Online normalizer calculation for softmax" paper☆103Updated 7 years ago
- CUDA 12.2 HMM demos☆20Updated last year