NVIDIA / gpu_affinityLinks
GPU Affinity is a package to automatically set the CPU process affinity to match the hardware architecture on a given platform
☆29Updated 2 years ago
Alternatives and similar repositories for gpu_affinity
Users that are interested in gpu_affinity are comparing it to the libraries listed below
Sorting:
- oneCCL Bindings for Pytorch* (deprecated)☆104Updated last month
- MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…☆45Updated this week
- A TUI-based utility for real-time monitoring of InfiniBand traffic and performance metrics on the local node☆63Updated last month
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆86Updated last week
- Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport☆73Updated 8 months ago
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆84Updated 2 weeks ago
- Ahead of Time (AOT) Triton Math Library☆88Updated last week
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆93Updated 2 years ago
- GPUDirect Async support for IB Verbs☆135Updated 3 years ago
- An extension library of WMMA API (Tensor Core API)☆109Updated last year
- Magnum IO community repo☆109Updated 2 months ago
- Training material for Nsight developer tools☆178Updated last year
- CUDA 12.2 HMM demos☆20Updated last year
- Emulating DMA Engines on GPUs for Performance and Portability☆41Updated 10 years ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Updated 3 years ago
- A tool for examining GPU scheduling behavior.☆91Updated last year
- oneAPI Collective Communications Library (oneCCL)☆254Updated last week
- study of cutlass☆22Updated last year
- Bandwidth test for ROCm☆75Updated last week
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆109Updated 8 years ago
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆144Updated this week
- Example of using pytorch's open device registration API☆31Updated 3 years ago
- Benchmark code for the "Online normalizer calculation for softmax" paper☆105Updated 7 years ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆106Updated 7 months ago
- This repository contains the results and code for the MLPerf™ Training v1.0 benchmark.☆36Updated last year
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.☆90Updated last month
- Python bindings for NVTX☆67Updated 2 years ago
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆154Updated 2 weeks ago
- A GPU-driven system framework for scalable AI applications☆124Updated last year
- ☆60Updated this week