microsoft / antaresLinks
Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.
☆470Updated 2 months ago
Alternatives and similar repositories for antares
Users that are interested in antares are comparing it to the libraries listed below
Sorting:
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆246Updated this week
- AMD's graph optimization engine.☆223Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆339Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆427Updated this week
- OpenAI Triton backend for Intel® GPUs☆191Updated this week
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆100Updated last month
- ☆108Updated last week
- Shared Middle-Layer for Triton Compilation☆256Updated this week
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆404Updated 5 months ago
- Backward compatible ML compute opset inspired by HLO/MHLO☆494Updated last week
- ☆416Updated this week
- Ahead of Time (AOT) Triton Math Library☆66Updated last week
- Next generation BLAS implementation for ROCm platform☆382Updated this week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆133Updated last year
- Experimental projects related to TensorRT☆105Updated last week
- ☆261Updated this week
- HIPIFY: Convert CUDA to Portable C++ Code☆590Updated this week
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆508Updated 2 years ago
- Assembler for NVIDIA Volta and Turing GPUs☆222Updated 3 years ago
- CUDA Kernel Benchmarking Library☆669Updated last week
- ROCm Communication Collectives Library (RCCL)☆342Updated this week
- GPUOcelot: A dynamic compilation framework for PTX☆195Updated 4 months ago
- Stores documents and resources used by the OpenXLA developer community☆124Updated 10 months ago
- ☆194Updated 2 years ago
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆134Updated this week
- A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.☆989Updated 9 months ago
- oneAPI Collective Communications Library (oneCCL)☆237Updated 2 weeks ago
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆582Updated 2 weeks ago
- collection of benchmarks to measure basic GPU capabilities☆385Updated 4 months ago
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆119Updated 2 years ago