intel / intel-extension-for-openxla
☆44Updated last month
Alternatives and similar repositories for intel-extension-for-openxla:
Users that are interested in intel-extension-for-openxla are comparing it to the libraries listed below
- ☆34Updated this week
- OpenAI Triton backend for Intel® GPUs☆168Updated this week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆61Updated last week
- ☆60Updated 2 months ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆313Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆39Updated 10 months ago
- CUDA Templates for Linear Algebra Subroutines☆14Updated this week
- Ahead of Time (AOT) Triton Math Library☆54Updated this week
- oneCCL Bindings for Pytorch*☆89Updated last week
- oneAPI Collective Communications Library (oneCCL)☆224Updated last week
- ☆49Updated last year
- MLIR-based partitioning system☆71Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆79Updated this week
- Experimental projects related to TensorRT☆89Updated this week
- Bandwidth test for ROCm☆54Updated this week
- An experimental CPU backend for Triton☆99Updated this week
- ☆73Updated 4 months ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆37Updated 7 months ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆100Updated 8 months ago
- Stores documents and resources used by the OpenXLA developer community☆117Updated 7 months ago
- Shared Middle-Layer for Triton Compilation☆230Updated this week
- ☆137Updated this week
- ☆51Updated 7 months ago
- An extension library of WMMA API (Tensor Core API)☆91Updated 8 months ago
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆130Updated this week
- Development repository for the Triton language and compiler☆109Updated this week
- rocWMMA☆102Updated this week
- High-Performance SGEMM on CUDA devices☆86Updated last month
- RCCL Performance Benchmark Tests☆59Updated this week