microsoft / antaresLinks
Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.
☆469Updated 6 months ago
Alternatives and similar repositories for antares
Users that are interested in antares are comparing it to the libraries listed below
Sorting:
- AMD's graph optimization engine.☆258Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆253Updated last week
- ☆126Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆478Updated this week
- HIPIFY: Convert CUDA to Portable C++ Code☆625Updated this week
- OpenAI Triton backend for Intel® GPUs☆212Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆358Updated this week
- ☆63Updated 10 months ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆134Updated 2 years ago
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆424Updated 9 months ago
- Ahead of Time (AOT) Triton Math Library☆80Updated last week
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆102Updated 2 months ago
- CUDA Kernel Benchmarking Library☆742Updated last week
- ☆271Updated last week
- A profiler to disclose and quantify hardware features on GPUs.☆174Updated 3 years ago
- Shared Middle-Layer for Triton Compilation☆292Updated 2 weeks ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆548Updated 2 years ago
- oneAPI Collective Communications Library (oneCCL)☆245Updated this week
- Development repository for the Triton language and compiler☆137Updated this week
- oneCCL Bindings for Pytorch*☆102Updated 2 months ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆108Updated 8 years ago
- Assembler for NVIDIA Volta and Turing GPUs☆230Updated 3 years ago
- GPUOcelot: A dynamic compilation framework for PTX☆210Updated 8 months ago
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆145Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆115Updated last week
- ☆156Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆388Updated this week
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆460Updated this week
- ☆422Updated this week
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆124Updated 2 years ago