artyom-beilis / dlprimitivesLinks
Deep Learning Primitives and Mini-Framework for OpenCL
☆200Updated last year
Alternatives and similar repositories for dlprimitives
Users that are interested in dlprimitives are comparing it to the libraries listed below
Sorting:
- DLPrimitives/OpenCL out of tree backend for pytorch☆368Updated last year
- HIPIFY: Convert CUDA to Portable C++ Code☆623Updated this week
- chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.☆298Updated 3 weeks ago
- ☆269Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆383Updated this week
- AMD's graph optimization engine.☆249Updated this week
- Implementation of OpenCL 3.0 on Vulkan☆406Updated last week
- ☆124Updated last week
- Tensors and Dynamic neural networks in Python with strong GPU acceleration☆240Updated this week
- A collection of examples for the ROCm software stack☆244Updated this week
- A tool which profiles OpenCL devices to find their peak capacities☆467Updated 3 months ago
- A small OpenCL benchmark program to measure peak GPU/CPU performance.☆249Updated last week
- Development repository for the Triton language and compiler☆131Updated this week
- OpenAI Triton backend for Intel® GPUs☆207Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆246Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆148Updated this week
- GPUOcelot: A dynamic compilation framework for PTX☆206Updated 7 months ago
- Tuned OpenCL BLAS☆1,142Updated this week
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆101Updated last month
- ☆60Updated 2 years ago
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆465Updated this week
- ☆151Updated this week
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆262Updated 8 months ago
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆419Updated 8 months ago
- 8-bit CUDA functions for PyTorch☆61Updated 3 weeks ago
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆259Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆353Updated this week
- High-Performance SGEMM on CUDA devices☆101Updated 8 months ago
- Tensor Tiling Library☆37Updated this week
- OpenCL/SPIR-V implementation of HIP☆105Updated 2 years ago