ROCm / AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
☆11Updated 8 months ago
Alternatives and similar repositories for AITemplate:
Users that are interested in AITemplate are comparing it to the libraries listed below
- Fast and memory-efficient exact attention☆159Updated this week
- OpenAI Triton backend for Intel® GPUs☆165Updated this week
- Ahead of Time (AOT) Triton Math Library☆54Updated this week
- Development repository for the Triton language and compiler☆108Updated this week
- ☆19Updated last week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆355Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆67Updated this week
- AMD's graph optimization engine.☆210Updated this week
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆37Updated 6 months ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆61Updated 2 months ago
- ☆20Updated this week
- ☆19Updated 3 months ago
- ☆34Updated this week
- oneCCL Bindings for Pytorch*☆89Updated 2 months ago
- RCCL Performance Benchmark Tests☆59Updated last month
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆237Updated 4 months ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆100Updated 7 months ago
- ☆185Updated 7 months ago
- ☆75Updated last week
- ROCm Communication Collectives Library (RCCL)☆303Updated this week
- Intel® Tensor Processing Primitives extension for Pytorch*☆11Updated this week
- A collection of examples for the ROCm software stack☆186Updated this week
- collection of benchmarks to measure basic GPU capabilities☆303Updated 2 weeks ago
- ☆71Updated 3 months ago
- ☆60Updated 2 months ago
- ☆48Updated 2 months ago
- python package of rocm-smi-lib☆20Updated 5 months ago
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆508Updated last month
- ☆105Updated 3 months ago