amd / ZenDNN-pytorchLinks
☆9Updated last year
Alternatives and similar repositories for ZenDNN-pytorch
Users that are interested in ZenDNN-pytorch are comparing it to the libraries listed below
Sorting:
- ☆106Updated last month
- The AMD rocAL is designed to efficiently decode and process images and videos from a variety of storage formats and modify them through a…☆17Updated this week
- ☆11Updated 6 months ago
- ☆19Updated 2 weeks ago
- Onboarding documentation source for the AMD Ryzen™ AI Software Platform. The AMD Ryzen™ AI Software Platform enables developers to take…☆64Updated this week
- ONNX Runtime: cross-platform, high performance scoring engine for ML models☆65Updated this week
- HIPCC: HIP compiler driver☆40Updated last year
- OpenVINO Intel NPU Compiler☆56Updated this week
- ☆20Updated this week
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆83Updated last week
- AMD's graph optimization engine.☆220Updated this week
- OpenAI Triton backend for Intel® GPUs☆189Updated this week
- SynapseAI Core is a reference implementation of the SynapseAI API running on Habana Gaudi☆41Updated 4 months ago
- GPU Stress Test is a tool to stress the compute engine of NVIDIA Tesla GPU’s by running a BLAS matrix multiply using different data types…☆94Updated last month
- Compass Optimizer (OPT for short), is part of the Zhouyi Compass Neural Network Compiler. The OPT is designed for converting the float In…☆29Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆79Updated this week
- AI Tensor Engine for ROCm☆201Updated this week
- Bandwidth test for ROCm☆56Updated 2 weeks ago
- A collection of examples for the ROCm software stack☆216Updated this week
- Windows version of NVIDIA's NCCL ('Nickel') for multi-GPU training - please use https://github.com/NVIDIA/nccl for changes.☆59Updated last year
- ☆46Updated this week
- ☆11Updated this week
- rocDecode is a high performance video decode SDK for AMD hardware☆26Updated this week
- ☆45Updated 11 months ago
- ☆60Updated last year
- Stretching GPU performance for GEMMs and tensor contractions.☆242Updated last week
- Library for modelling performance costs of different Neural Network workloads on NPU devices☆34Updated last month
- ☆136Updated this week
- A C++ port of karpathy/llm.c features a tiny torch library while maintaining overall simplicity.☆33Updated 10 months ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…