ROCm / aiter
AI Tensor Engine for ROCm
☆187Updated this week
Alternatives and similar repositories for aiter:
Users that are interested in aiter are comparing it to the libraries listed below
- OpenAI Triton backend for Intel® GPUs☆183Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆393Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆93Updated this week
- Perplexity GPU Kernels☆272Updated last week
- Development repository for the Triton language and compiler☆118Updated this week
- Ahead of Time (AOT) Triton Math Library☆63Updated 2 weeks ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆345Updated last week
- ROCm Communication Collectives Library (RCCL)☆330Updated this week
- An experimental CPU backend for Triton☆110Updated last week
- Fast low-bit matmul kernels in Triton☆297Updated this week
- Shared Middle-Layer for Triton Compilation☆246Updated 2 weeks ago
- RCCL Performance Benchmark Tests☆64Updated last week
- ☆104Updated last month
- NVIDIA Inference Xfer Library (NIXL)☆304Updated this week
- ☆202Updated 9 months ago
- Applied AI experiments and examples for PyTorch☆264Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆76Updated this week
- Experimental projects related to TensorRT☆99Updated last week
- rocWMMA☆110Updated last week
- ☆24Updated 2 months ago
- amdgpu example code in hip/asm☆31Updated 3 weeks ago
- ROCm BLAS marshalling library☆140Updated this week
- ☆106Updated last month
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆324Updated this week
- Fast and memory-efficient exact attention☆173Updated this week
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆106Updated 9 months ago
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆601Updated last week
- ☆70Updated 4 months ago
- Fastest kernels written from scratch☆256Updated last month
- ☆142Updated this week