amd / RyzenAI-SW
☆456Updated last month
Alternatives and similar repositories for RyzenAI-SW:
Users that are interested in RyzenAI-SW are comparing it to the libraries listed below
- ☆357Updated last week
- Intel® NPU (Neural Processing Unit) Driver☆215Updated last month
- Intel® NPU Acceleration Library☆593Updated 2 weeks ago
- HIPIFY: Convert CUDA to Portable C++ Code☆542Updated this week
- chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.☆246Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆338Updated this week
- Fork of LLVM to support AMD AIEngine processors☆124Updated this week
- A collection of examples for the ROCm software stack☆179Updated this week
- OpenAI Triton backend for Intel® GPUs☆157Updated this week
- ☆173Updated this week
- ☆115Updated this week
- DLPrimitives/OpenCL out of tree backend for pytorch☆307Updated 4 months ago
- AMD related optimizations for transformer models☆64Updated 2 months ago
- AMD's graph optimization engine.☆196Updated this week
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆436Updated this week
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆503Updated this week
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆373Updated 2 weeks ago
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆488Updated this week
- Development repository for the Triton language and compiler☆104Updated this week
- An MLIR-based toolchain for AMD AI Engine-enabled devices.☆327Updated this week
- ☆244Updated this week
- Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYC…☆455Updated this week
- Next generation BLAS implementation for ROCm platform☆359Updated this week
- ☆58Updated last year
- ROCm Platform Runtime: ROCr a HPC market enhanced HSA based runtime☆234Updated this week
- ☆97Updated this week
- Deep Learning Primitives and Mini-Framework for OpenCL☆187Updated 4 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆352Updated 5 months ago
- A Python package for extending the official PyTorch that can easily obtain performance on Intel platform☆1,716Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆73Updated this week