NVIDIA / atex
A TensorFlow Extension: GPU performance tools for TensorFlow.
☆25Updated last year
Related projects ⓘ
Alternatives and complementary repositories for atex
- Common source, scripts and utilities shared across all Triton repositories.☆62Updated this week
- A GPU-driven system framework for scalable AI applications☆109Updated last month
- A tool for bandwidth measurements on NVIDIA GPUs.☆321Updated last month
- An extension library of WMMA API (Tensor Core API)☆84Updated 4 months ago
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆206Updated this week
- Benchmarks to capture important workloads.☆28Updated 5 months ago
- Fast and memory-efficient exact attention☆30Updated 3 weeks ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆271Updated this week
- PyTorch distributed training acceleration framework☆34Updated this week
- HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of…☆133Updated 2 months ago
- Experimental projects related to TensorRT☆81Updated this week
- ☆88Updated last week
- ☆79Updated 2 months ago
- A visualization tool to display TF-Grappler optimized op graph☆12Updated 2 years ago
- Computation using data flow graphs for scalable machine learning☆67Updated this week
- Bandwidth test for ROCm☆49Updated this week
- oneAPI Collective Communications Library (oneCCL)☆206Updated this week
- ☆48Updated 8 months ago
- Shared Middle-Layer for Triton Compilation☆191Updated this week
- TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.☆156Updated this week
- MLPerf™ logging library☆30Updated this week
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆95Updated this week
- Standalone Flash Attention v2 kernel without libtorch dependency☆98Updated 2 months ago
- SynapseAI Core is a reference implementation of the SynapseAI API running on Habana Gaudi☆37Updated last year
- A Python library transfers PyTorch tensors between CPU and NVMe☆98Updated last week
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆250Updated this week
- The Triton backend for the PyTorch TorchScript models.☆127Updated this week
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆75Updated last week
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆307Updated this week
- ☆55Updated 5 months ago