artyom-beilis / pytorch_dlprim
DLPrimitives/OpenCL out of tree backend for pytorch
☆317Updated 5 months ago
Alternatives and similar repositories for pytorch_dlprim:
Users that are interested in pytorch_dlprim are comparing it to the libraries listed below
- Deep Learning Primitives and Mini-Framework for OpenCL☆187Updated 5 months ago
- HIPIFY: Convert CUDA to Portable C++ Code☆552Updated this week
- chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.☆251Updated this week
- AMD's graph optimization engine.☆208Updated this week
- OpenAI Triton backend for Intel® GPUs☆165Updated this week
- Tuned OpenCL BLAS☆1,084Updated 3 months ago
- A collection of examples for the ROCm software stack☆186Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆349Updated this week
- Next generation BLAS implementation for ROCm platform☆359Updated this week
- Development repository for the Triton language and compiler☆107Updated this week
- GPUOcelot: A dynamic compilation framework for PTX☆166Updated last week
- ☆117Updated this week
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆263Updated last month
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆79Updated this week
- rocWMMA☆100Updated this week
- ☆60Updated 2 months ago
- ☆371Updated this week
- A tool which profiles OpenCL devices to find their peak capacities☆430Updated last month
- An open-source efficient deep learning framework/compiler, written in python.☆681Updated last week
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆523Updated last week
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆426Updated last year
- ☆58Updated last year
- Tensors and Dynamic neural networks in Python with strong GPU acceleration☆221Updated this week
- Stretching GPU performance for GEMMs and tensor contractions.☆233Updated this week
- Intel® Extension for TensorFlow*☆329Updated last month
- ☆105Updated 3 months ago
- Assembler for NVIDIA Volta and Turing GPUs☆212Updated 3 years ago
- Backward compatible ML compute opset inspired by HLO/MHLO☆446Updated last week
- The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.☆1,424Updated this week
- ROCm Platform Runtime: ROCr a HPC market enhanced HSA based runtime☆235Updated this week