NVIDIA / atex
A TensorFlow Extension: GPU performance tools for TensorFlow.
☆25Updated last year
Related projects: ⓘ
- A GPU-driven system framework for scalable AI applications☆103Updated this week
- Benchmarks to capture important workloads.☆28Updated 3 months ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆33Updated last year
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆188Updated 3 weeks ago
- ☆80Updated 3 months ago
- A tracing JIT for PyTorch☆18Updated 2 years ago
- An extension library of WMMA API (Tensor Core API)☆81Updated 2 months ago
- SynapseAI Core is a reference implementation of the SynapseAI API running on Habana Gaudi☆36Updated last year
- Standalone Flash Attention v2 kernel without libtorch dependency☆93Updated last week
- ☆48Updated 6 months ago
- HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of…☆126Updated 3 weeks ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆92Updated last year
- A tool for bandwidth measurements on NVIDIA GPUs.☆285Updated 3 months ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆250Updated this week
- ☆50Updated 3 months ago
- Bandwidth test for ROCm☆45Updated this week
- An IR for efficiently simulating distributed ML computation.☆24Updated 8 months ago
- Inference framework for MoE layers based on TensorRT with Python binding☆41Updated 3 years ago
- Training material for Nsight developer tools☆125Updated last month
- Computation using data flow graphs for scalable machine learning☆65Updated this week
- Common source, scripts and utilities shared across all Triton repositories.☆62Updated last week
- The Triton backend for the PyTorch TorchScript models.☆117Updated last week
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆81Updated 2 months ago
- Fast and memory-efficient exact attention☆20Updated 2 weeks ago
- oneCCL Bindings for Pytorch*☆83Updated last week
- Development repository for the Triton language and compiler☆86Updated this week
- An experimental CPU backend for Triton☆36Updated last week
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆279Updated last week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆48Updated this week
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆233Updated this week