artyom-beilis / dlprimitives
Deep Learning Primitives and Mini-Framework for OpenCL
☆175Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for dlprimitives
- DLPrimitives/OpenCL out of tree backend for pytorch☆287Updated 2 months ago
- ☆231Updated this week
- HIPIFY: Convert CUDA to Portable C++ Code☆523Updated this week
- Development repository for the Triton language and compiler☆93Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆313Updated this week
- An implementation of BLAS using the SYCL open standard.☆259Updated 2 weeks ago
- Tuned OpenCL BLAS☆1,063Updated last week
- chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.☆227Updated this week
- Next generation BLAS implementation for ROCm platform☆346Updated this week
- AMD's graph optimization engine.☆186Updated this week
- A tool which profiles OpenCL devices to find their peak capacities☆411Updated 2 weeks ago
- Nod.ai 🦈 version of 👻 . You probably want to start at https://github.com/nod-ai/shark for the product and the upstream IREE repository …☆107Updated this week
- Stretching GPU performance for GEMMs and tensor contractions.☆223Updated this week
- OpenAI Triton backend for Intel® GPUs☆143Updated this week
- A small OpenCL benchmark program to measure peak GPU/CPU performance.☆164Updated this week
- OpenCL/SPIR-V implementation of HIP☆104Updated 2 years ago
- GPUOcelot: A dynamic compilation framework for PTX☆147Updated last month
- Tensors and Dynamic neural networks in Python with strong GPU acceleration☆219Updated this week
- ☆58Updated last year
- ☆88Updated this week
- A collection of examples for the ROCm software stack☆167Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆63Updated this week
- Implementation of OpenCL 3.0 on Vulkan☆359Updated this week
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆363Updated 3 months ago
- ☆101Updated this week
- Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure☆767Updated this week
- ROCm's Thunk Interface☆83Updated 2 weeks ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆270Updated this week
- ROCm Communication Collectives Library (RCCL)☆268Updated this week
- ☆318Updated last week