intel / intel-extension-for-openxla
☆44Updated last month
Alternatives and similar repositories for intel-extension-for-openxla:
Users that are interested in intel-extension-for-openxla are comparing it to the libraries listed below
- ☆35Updated this week
- OpenAI Triton backend for Intel® GPUs☆168Updated this week
- Ahead of Time (AOT) Triton Math Library☆54Updated last week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated 2 weeks ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆313Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆40Updated 10 months ago
- oneCCL Bindings for Pytorch*☆89Updated last week
- ☆60Updated 3 months ago
- ☆73Updated 4 months ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆100Updated 8 months ago
- High-Performance SGEMM on CUDA devices☆86Updated last month
- ☆49Updated last year
- MLIR-based partitioning system☆72Updated this week
- rocWMMA☆102Updated this week
- ☆137Updated this week
- extensible collectives library in triton☆83Updated 5 months ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆79Updated last year
- ☆51Updated 7 months ago
- ROCm BLAS marshalling library☆133Updated this week
- Shared Middle-Layer for Triton Compilation☆232Updated last week
- Experiments and prototypes associated with IREE or MLIR☆50Updated 7 months ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆37Updated 7 months ago
- Collection of kernels written in Triton language☆111Updated last month
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆99Updated 3 weeks ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆128Updated last year
- Stretching GPU performance for GEMMs and tensor contractions.☆233Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆81Updated this week
- Development repository for the Triton language and compiler☆111Updated this week
- oneAPI Collective Communications Library (oneCCL)☆224Updated 2 weeks ago
- An experimental CPU backend for Triton☆99Updated last week