ROCm / triton
Development repository for the Triton language and compiler
☆118Updated this week
Alternatives and similar repositories for triton:
Users that are interested in triton are comparing it to the libraries listed below
- Ahead of Time (AOT) Triton Math Library☆63Updated 2 weeks ago
- OpenAI Triton backend for Intel® GPUs☆183Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆393Updated this week
- ☆142Updated this week
- AI Tensor Engine for ROCm☆187Updated this week
- Fast and memory-efficient exact attention☆173Updated this week
- Shared Middle-Layer for Triton Compilation☆246Updated 2 weeks ago
- ☆30Updated this week
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆100Updated 2 months ago
- An experimental CPU backend for Triton☆110Updated last week
- ☆202Updated 9 months ago
- ☆104Updated last month
- MLIR-based partitioning system☆82Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆93Updated this week
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆106Updated 9 months ago
- ☆50Updated last year
- ☆78Updated 6 months ago
- ☆70Updated 4 months ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆324Updated this week
- RCCL Performance Benchmark Tests☆64Updated last week
- IREE's PyTorch Frontend, based on Torch Dynamo.☆82Updated this week
- PyTorch emulation library for Microscaling (MX)-compatible data formats☆221Updated 3 weeks ago
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆24Updated last month
- rocWMMA☆110Updated last week
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆134Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆76Updated this week
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆84Updated last week
- Tenstorrent MLIR compiler☆122Updated this week
- ☆96Updated last year
- ☆202Updated 2 weeks ago