ROCm / apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
☆19Updated this week
Related projects ⓘ
Alternatives and complementary repositories for apex
- RCCL Performance Benchmark Tests☆50Updated 3 weeks ago
- Ahead of Time (AOT) Triton Math Library☆41Updated this week
- oneCCL Bindings for Pytorch*☆86Updated 3 weeks ago
- ☆13Updated this week
- OpenAI Triton backend for Intel® GPUs☆143Updated this week
- Benchmark code for the "Online normalizer calculation for softmax" paper☆59Updated 6 years ago
- ☆169Updated 4 months ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆98Updated last week
- ROCm Communication Collectives Library (RCCL)☆270Updated this week
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆90Updated 4 months ago
- A tool for bandwidth measurements on NVIDIA GPUs.☆321Updated last month
- Assembler for NVIDIA Volta and Turing GPUs☆201Updated 2 years ago
- ☆48Updated this week
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆209Updated 3 weeks ago
- ☆55Updated 5 months ago
- ☆88Updated 2 months ago
- ☆16Updated last week
- ☆79Updated 2 months ago
- A library of GPU kernels for sparse matrix operations.☆249Updated 3 years ago
- Applied AI experiments and examples for PyTorch☆166Updated 3 weeks ago
- Experimental projects related to TensorRT☆81Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆43Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆313Updated this week
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆30Updated 3 months ago
- ☆48Updated 8 months ago
- An extension library of WMMA API (Tensor Core API)☆84Updated 4 months ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆57Updated 2 months ago
- ☆12Updated last month
- ☆14Updated last month
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆271Updated this week