NVIDIA / gpu_affinityLinks
GPU Affinity is a package to automatically set the CPU process affinity to match the hardware architecture on a given platform
☆27Updated last year
Alternatives and similar repositories for gpu_affinity
Users that are interested in gpu_affinity are comparing it to the libraries listed below
Sorting:
- Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport☆63Updated 5 months ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆88Updated last year
- A Python script to convert the output of NVIDIA Nsight Systems (in SQLite format) to JSON in Google Chrome Trace Event Format.☆39Updated 2 months ago
- oneCCL Bindings for Pytorch*☆102Updated 2 months ago
- ☆27Updated 7 months ago
- An extension of rCUDA that enables remote-to-local GPU migration☆39Updated 9 years ago
- An extension library of WMMA API (Tensor Core API)☆106Updated last year
- Experiments evaluating preemption on the NVIDIA Pascal architecture☆17Updated 8 years ago
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆120Updated 5 months ago
- Magnum IO community repo☆99Updated last month
- CUPTI GPU Profiler☆40Updated 6 years ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Updated 3 years ago
- oneAPI Collective Communications Library (oneCCL)☆245Updated 2 weeks ago
- CUDA 12.2 HMM demos☆20Updated last year
- A tool for examining GPU scheduling behavior.☆88Updated last year
- Benchmark code for the "Online normalizer calculation for softmax" paper☆101Updated 7 years ago
- gossip: Efficient Communication Primitives for Multi-GPU Systems☆59Updated 3 years ago
- Microsoft Collective Communication Library☆66Updated 10 months ago
- Triton adapter for Ascend. Mirror of https://gitee.com/ascend/triton-ascend☆76Updated 2 weeks ago
- study of cutlass☆22Updated 11 months ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆97Updated 3 months ago
- RCCL Performance Benchmark Tests☆77Updated last week
- pytorch-profiler☆51Updated 2 years ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- This repository contains the results and code for the MLPerf™ Training v1.0 benchmark.☆37Updated last year
- GPUDirect Async support for IB Verbs☆130Updated 2 years ago
- A TUI-based utility for real-time monitoring of InfiniBand traffic and performance metrics on the local node☆30Updated last month
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆107Updated 8 years ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆111Updated last year
- Emulating DMA Engines on GPUs for Performance and Portability☆41Updated 10 years ago