artyom-beilis / dlprimitivesLinks
Deep Learning Primitives and Mini-Framework for OpenCL
☆197Updated 8 months ago
Alternatives and similar repositories for dlprimitives
Users that are interested in dlprimitives are comparing it to the libraries listed below
Sorting:
- DLPrimitives/OpenCL out of tree backend for pytorch☆350Updated 9 months ago
- chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.☆280Updated this week
- Development repository for the Triton language and compiler☆122Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆401Updated this week
- HIPIFY: Convert CUDA to Portable C++ Code☆585Updated this week
- Implementation of OpenCL 3.0 on Vulkan☆394Updated this week
- ☆258Updated this week
- GPUOcelot: A dynamic compilation framework for PTX☆192Updated 3 months ago
- A collection of examples for the ROCm software stack☆216Updated this week
- A tool which profiles OpenCL devices to find their peak capacities☆450Updated last week
- Stretching GPU performance for GEMMs and tensor contractions.☆242Updated last week
- The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm☆132Updated this week
- Next generation BLAS implementation for ROCm platform☆380Updated this week
- Fork of LLVM to support AMD AIEngine processors☆143Updated this week
- AMD's graph optimization engine.☆220Updated this week
- ☆106Updated last month
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆100Updated 2 weeks ago
- OpenAI Triton backend for Intel® GPUs☆189Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆97Updated this week
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆401Updated 4 months ago
- AI Tensor Engine for ROCm☆201Updated this week
- OpenCL/SPIR-V implementation of HIP☆104Updated 2 years ago
- ☆136Updated this week
- rocWMMA☆114Updated last week
- ☆60Updated last year
- 8-bit CUDA functions for PyTorch☆53Updated 3 weeks ago
- Tuned OpenCL BLAS☆1,108Updated last month
- ROCm BLAS marshalling library☆142Updated this week
- CUDA Kernel Benchmarking Library☆656Updated last week
- High-Performance SGEMM on CUDA devices☆94Updated 4 months ago