microsoft / antares
Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.
☆465Updated last month
Alternatives and similar repositories for antares:
Users that are interested in antares are comparing it to the libraries listed below
- Stretching GPU performance for GEMMs and tensor contractions.☆234Updated 2 weeks ago
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆373Updated this week
- AMD's graph optimization engine.☆213Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆313Updated this week
- HIPIFY: Convert CUDA to Portable C++ Code☆567Updated this week
- A collection of examples for the ROCm software stack☆198Updated this week
- Next generation BLAS implementation for ROCm platform☆361Updated this week
- Assembler for NVIDIA Volta and Turing GPUs☆214Updated 3 years ago
- ROCm Communication Collectives Library (RCCL)☆309Updated this week
- CUDA Kernel Benchmarking Library☆596Updated 3 weeks ago
- ☆250Updated this week
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆465Updated last year
- OpenAI Triton backend for Intel® GPUs☆172Updated this week
- oneAPI Collective Communications Library (oneCCL)☆227Updated this week
- Shared Middle-Layer for Triton Compilation☆236Updated this week
- collection of benchmarks to measure basic GPU capabilities☆340Updated last month
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆365Updated this week
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆534Updated last week
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆374Updated 6 months ago
- ROCm BLAS marshalling library☆136Updated this week
- Development repository for the Triton language and compiler☆114Updated this week
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆391Updated 2 months ago
- Microsoft Collective Communication Library☆344Updated last year
- A home for the final text of all TVM RFCs.☆102Updated 6 months ago
- ☆106Updated 3 weeks ago
- An extension library of WMMA API (Tensor Core API)☆93Updated 8 months ago
- ROCm Platform Runtime: ROCr a HPC market enhanced HSA based runtime☆238Updated this week
- rocWMMA☆105Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆84Updated this week
- ROCm Parallel Primitives☆171Updated last week