ROCm / TransformerEngine
☆15Updated this week
Alternatives and similar repositories for TransformerEngine:
Users that are interested in TransformerEngine are comparing it to the libraries listed below
- An extension library of WMMA API (Tensor Core API)☆87Updated 6 months ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆36Updated 5 months ago
- RCCL Performance Benchmark Tests☆55Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆37Updated 8 months ago
- ☆57Updated 7 months ago
- ☆66Updated 3 weeks ago
- Bandwidth test for ROCm☆52Updated this week
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆15Updated this week
- Ahead of Time (AOT) Triton Math Library☆49Updated this week
- ☆48Updated 10 months ago
- Development repository for the Triton language and compiler☆102Updated this week
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆75Updated this week
- ☆27Updated 3 weeks ago
- ☆35Updated last month
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆78Updated this week
- extensible collectives library in triton☆76Updated 3 months ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆51Updated 4 months ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆93Updated 6 months ago
- ☆38Updated 4 years ago
- ☆64Updated 2 months ago
- ☆21Updated last week
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆87Updated 10 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆99Updated 4 months ago
- An experimental CPU backend for Triton☆75Updated this week
- rocWMMA☆97Updated this week
- ☆23Updated 10 months ago
- Fast GPU based tensor core reductions☆13Updated 2 years ago
- OpenAI Triton backend for Intel® GPUs☆154Updated this week
- Fast and memory-efficient exact attention☆41Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆72Updated this week