ROCm / aiterLinks
AI Tensor Engine for ROCm
☆207Updated this week
Alternatives and similar repositories for aiter
Users that are interested in aiter are comparing it to the libraries listed below
Sorting:
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆103Updated this week
- OpenAI Triton backend for Intel® GPUs☆190Updated this week
- An experimental CPU backend for Triton☆126Updated 2 weeks ago
- Development repository for the Triton language and compiler☆125Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆423Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆245Updated this week
- Perplexity GPU Kernels☆364Updated last week
- Ahead of Time (AOT) Triton Math Library☆66Updated this week
- ROCm BLAS marshalling library☆144Updated this week
- Fast low-bit matmul kernels in Triton☆322Updated this week
- RCCL Performance Benchmark Tests☆67Updated last month
- ☆115Updated last month
- rocWMMA☆115Updated this week
- ☆25Updated last month
- A high-throughput and memory-efficient inference and serving engine for LLMs☆84Updated this week
- ☆38Updated this week
- ☆107Updated last week
- ☆91Updated 5 months ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆98Updated last month
- ☆148Updated this week
- extensible collectives library in triton☆86Updated 2 months ago
- ROCm Communication Collectives Library (RCCL)☆342Updated this week
- ☆62Updated 6 months ago
- ☆81Updated 7 months ago
- amdgpu example code in hip/asm☆32Updated last week
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆27Updated 3 months ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆61Updated 3 months ago
- ☆212Updated 11 months ago
- Next generation BLAS implementation for ROCm platform☆381Updated this week
- Experimental projects related to TensorRT☆105Updated this week