artyom-beilis / pytorch_dlprim
DLPrimitives/OpenCL out of tree backend for pytorch
☆287Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for pytorch_dlprim
- Deep Learning Primitives and Mini-Framework for OpenCL☆175Updated 2 months ago
- HIPIFY: Convert CUDA to Portable C++ Code☆523Updated this week
- chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.☆227Updated this week
- Development repository for the Triton language and compiler☆93Updated this week
- ☆231Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆313Updated this week
- A collection of examples for the ROCm software stack☆167Updated this week
- ☆318Updated last week
- Next generation BLAS implementation for ROCm platform☆346Updated this week
- Tuned OpenCL BLAS☆1,063Updated last week
- Stretching GPU performance for GEMMs and tensor contractions.☆223Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆63Updated this week
- ☆399Updated this week
- A small OpenCL benchmark program to measure peak GPU/CPU performance.☆164Updated this week
- OpenAI Triton backend for Intel® GPUs☆143Updated this week
- AMD's graph optimization engine.☆186Updated this week
- A tool which profiles OpenCL devices to find their peak capacities☆411Updated 2 weeks ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆405Updated last year
- A Python package for extending the official PyTorch that can easily obtain performance on Intel platform☆1,624Updated this week
- ☆101Updated this week
- build scripts for ROCm☆181Updated 10 months ago
- An implementation of BLAS using the SYCL open standard.☆259Updated 2 weeks ago
- Intel® NPU Acceleration Library☆507Updated this week
- Intel® Extension for TensorFlow*☆318Updated last month
- Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure☆767Updated this week
- GPUOcelot: A dynamic compilation framework for PTX☆147Updated last month
- The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.☆1,355Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆270Updated this week
- ☆58Updated last year
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆409Updated this week