NVIDIA / atex
A TensorFlow Extension: GPU performance tools for TensorFlow.
☆25Updated last year
Alternatives and similar repositories for atex:
Users that are interested in atex are comparing it to the libraries listed below
- A GPU-driven system framework for scalable AI applications☆112Updated 3 weeks ago
- Training material for Nsight developer tools☆149Updated 6 months ago
- PyTorch distributed training acceleration framework☆43Updated 2 weeks ago
- A tool for bandwidth measurements on NVIDIA GPUs.☆378Updated 3 weeks ago
- HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of…☆139Updated this week
- An extension library of WMMA API (Tensor Core API)☆90Updated 7 months ago
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆79Updated 2 weeks ago
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆223Updated 3 weeks ago
- oneAPI Collective Communications Library (oneCCL)☆223Updated last month
- A Python library transfers PyTorch tensors between CPU and NVMe☆106Updated 3 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆105Updated 5 months ago
- End to End steps for adding custom ops in PyTorch.☆20Updated 4 years ago
- ☆105Updated 3 months ago
- SynapseAI Core is a reference implementation of the SynapseAI API running on Habana Gaudi☆38Updated last month
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆310Updated this week
- 📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1)⚡️GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.☆123Updated last week
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆82Updated last year
- Benchmarks to capture important workloads.☆29Updated last month
- Fast and memory-efficient exact attention☆47Updated this week
- The Triton backend for the ONNX Runtime.☆139Updated this week
- ☆75Updated last week
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆304Updated this week
- A tracing JIT for PyTorch☆17Updated 2 years ago
- AMD's graph optimization engine.☆210Updated this week
- Bandwidth test for ROCm☆54Updated 2 weeks ago
- ☆49Updated last year
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆350Updated 2 weeks ago
- IREE's PyTorch Frontend, based on Torch Dynamo.☆71Updated this week
- AMD’s C++ library for accelerating tensor primitives☆38Updated this week
- ☆71Updated 3 months ago