ROCm / aiterLinks
AI Tensor Engine for ROCm
☆254Updated this week
Alternatives and similar repositories for aiter
Users that are interested in aiter are comparing it to the libraries listed below
Sorting:
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆447Updated this week
- OpenAI Triton backend for Intel® GPUs☆200Updated last week
- Development repository for the Triton language and compiler☆127Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆111Updated this week
- An experimental CPU backend for Triton☆143Updated 2 months ago
- Ahead of Time (AOT) Triton Math Library☆75Updated this week
- Shared Middle-Layer for Triton Compilation☆268Updated last week
- ☆41Updated last week
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆112Updated 3 months ago
- Perplexity GPU Kernels☆435Updated 2 weeks ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆247Updated last week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆348Updated this week
- collection of benchmarks to measure basic GPU capabilities☆408Updated 6 months ago
- ☆148Updated last week
- Tilus is a tile-level kernel programming language, implemented in Python.☆115Updated 2 weeks ago
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆59Updated 2 months ago
- ☆135Updated 3 months ago
- ☆106Updated 7 months ago
- Fast and memory-efficient exact attention☆182Updated last week
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆31Updated 5 months ago
- Fast low-bit matmul kernels in Triton☆349Updated this week
- Experimental projects related to TensorRT☆110Updated last week
- ☆26Updated 2 months ago
- amdgpu example code in hip/asm☆38Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆91Updated this week
- TritonParse: A Compiler Tracer, Visualizer, and mini-Reproducer(WIP) for Triton Kernels☆144Updated this week
- ROCm Communication Collectives Library (RCCL)☆360Updated this week
- ☆86Updated 9 months ago
- RCCL Performance Benchmark Tests☆73Updated 3 weeks ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆44Updated 5 months ago