ROCm / apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
☆19Updated last week
Related projects ⓘ
Alternatives and complementary repositories for apex
- RCCL Performance Benchmark Tests☆49Updated 2 weeks ago
- Benchmark code for the "Online normalizer calculation for softmax" paper☆59Updated 6 years ago
- Benchmarks to capture important workloads.☆28Updated 5 months ago
- ☆55Updated 5 months ago
- ☆47Updated 2 weeks ago
- ☆162Updated 4 months ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆87Updated 4 months ago
- Ahead of Time (AOT) Triton Math Library☆40Updated this week
- Research and development for optimizing transformers☆124Updated 3 years ago
- oneCCL Bindings for Pytorch*☆86Updated 2 weeks ago
- ☆12Updated this week
- OpenAI Triton backend for Intel® GPUs☆143Updated this week
- A Python library transfers PyTorch tensors between CPU and NVMe☆96Updated this week
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆196Updated 2 weeks ago
- ☆11Updated last month
- MLPerf™ logging library☆30Updated last week
- PArallelLOOPgEneratoR: Threaded Loops Code Generation Infrastructure targeting Tensor Contraction Applications such as GEMMs, Convolution…☆18Updated last month
- A high-throughput and memory-efficient inference and serving engine for LLMs☆41Updated this week
- ☆79Updated 2 months ago
- ☆156Updated last year
- Development repository for the Triton language and compiler☆93Updated this week
- ☆16Updated last week
- PyTorch bindings for CUTLASS grouped GEMM.☆53Updated last week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆57Updated 2 months ago
- ☆48Updated 8 months ago
- Bandwidth test for ROCm☆47Updated last week
- Applied AI experiments and examples for PyTorch☆160Updated last week
- Experimental projects related to TensorRT☆78Updated this week
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆65Updated last year
- CUDA 12.2 HMM demos☆17Updated 3 months ago