ROCm / aiter
AI Tensor Engine for ROCm
☆119Updated this week
Alternatives and similar repositories for aiter:
Users that are interested in aiter are comparing it to the libraries listed below
- Development repository for the Triton language and compiler☆114Updated this week
- OpenAI Triton backend for Intel® GPUs☆170Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆83Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆370Updated this week
- An experimental CPU backend for Triton☆101Updated this week
- Shared Middle-Layer for Triton Compilation☆233Updated 2 weeks ago
- Stretching GPU performance for GEMMs and tensor contractions.☆234Updated last week
- collection of benchmarks to measure basic GPU capabilities☆309Updated last month
- Ahead of Time (AOT) Triton Math Library☆56Updated last week
- ☆21Updated last month
- ROCm Communication Collectives Library (RCCL)☆308Updated this week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated 3 weeks ago
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆99Updated last month
- rocWMMA☆104Updated last week
- Experimental projects related to TensorRT☆94Updated this week
- Fast and memory-efficient exact attention☆162Updated this week
- ☆57Updated 3 months ago
- ☆25Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆314Updated this week
- ☆138Updated this week
- ☆90Updated 2 weeks ago
- PyTorch emulation library for Microscaling (MX)-compatible data formats☆212Updated 6 months ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆315Updated this week
- ☆22Updated last week
- RCCL Performance Benchmark Tests☆60Updated 2 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆69Updated this week
- ☆193Updated 8 months ago
- Applied AI experiments and examples for PyTorch☆250Updated last week
- Fast low-bit matmul kernels in Triton☆272Updated this week
- ☆61Updated 3 months ago