intel / intel-extension-for-openxla
☆47Updated 3 weeks ago
Alternatives and similar repositories for intel-extension-for-openxla:
Users that are interested in intel-extension-for-openxla are comparing it to the libraries listed below
- ☆44Updated this week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated 2 months ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆324Updated this week
- OpenAI Triton backend for Intel® GPUs☆183Updated this week
- Ahead of Time (AOT) Triton Math Library☆63Updated 2 weeks ago
- ☆60Updated 4 months ago
- MLIR-based partitioning system☆82Updated this week
- oneCCL Bindings for Pytorch*☆95Updated 2 weeks ago
- ☆50Updated last year
- Stores documents and resources used by the OpenXLA developer community☆121Updated 9 months ago
- A CUTLASS implementation using SYCL☆20Updated this week
- Experimental projects related to TensorRT☆99Updated this week
- RCCL Performance Benchmark Tests☆64Updated last week
- ☆30Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆93Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆40Updated last month
- ☆142Updated this week
- oneAPI Collective Communications Library (oneCCL)☆232Updated last week
- An extension library of WMMA API (Tensor Core API)☆96Updated 9 months ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆84Updated last week
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆80Updated this week
- Bandwidth test for ROCm☆54Updated this week
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆90Updated last month
- An experimental CPU backend for Triton☆110Updated last week
- High-Performance SGEMM on CUDA devices☆90Updated 3 months ago
- rocWMMA☆110Updated this week
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆60Updated last month
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆134Updated last week
- ☆106Updated 3 weeks ago
- Benchmarks to capture important workloads.☆31Updated 3 months ago