ROCm / AITemplateLinks
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
☆12Updated last year
Alternatives and similar repositories for AITemplate
Users that are interested in AITemplate are comparing it to the libraries listed below
Sorting:
- Fast and memory-efficient exact attention☆208Updated this week
- Development repository for the Triton language and compiler☆140Updated this week
- AI Tensor Engine for ROCm☆341Updated this week
- OpenAI Triton backend for Intel® GPUs☆224Updated this week
- ☆29Updated 3 months ago
- ☆57Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆113Updated this week
- Ahead of Time (AOT) Triton Math Library☆87Updated last week
- SYCL* Templates for Linear Algebra (SYCL*TLA) - SYCL based CUTLASS implementation for Intel GPUs☆63Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆515Updated this week
- 8-bit CUDA functions for PyTorch☆69Updated 3 months ago
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆50Updated last year
- A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch☆25Updated last month
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆276Updated 6 months ago
- Intel® Tensor Processing Primitives extension for Pytorch*☆17Updated this week
- AMD's graph optimization engine.☆271Updated this week
- ☆256Updated last year
- ☆72Updated this week
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆673Updated last month
- ☆61Updated last year
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆113Updated this week
- Github mirror of trition-lang/triton repo.☆124Updated this week
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆124Updated 2 months ago
- A collection of examples for the ROCm software stack☆273Updated this week
- ROCm Communication Collectives Library (RCCL)☆410Updated this week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆63Updated 6 months ago
- monorepo for rocm libraries☆225Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆138Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆234Updated this week
- ☆107Updated this week