microsoft / antares
Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.
☆465Updated last month
Alternatives and similar repositories for antares:
Users that are interested in antares are comparing it to the libraries listed below
- AMD's graph optimization engine.☆213Updated this week
- Stretching GPU performance for GEMMs and tensor contractions.☆234Updated 2 weeks ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆314Updated this week
- OpenAI Triton backend for Intel® GPUs☆172Updated this week
- Shared Middle-Layer for Triton Compilation☆236Updated this week
- Assembler for NVIDIA Volta and Turing GPUs☆214Updated 3 years ago
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆373Updated this week
- Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure☆834Updated this week
- ROCm Communication Collectives Library (RCCL)☆309Updated this week
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆99Updated last month
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆84Updated this week
- ☆193Updated 2 years ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆465Updated last year
- Microsoft Collective Communication Library☆344Updated last year
- HIPIFY: Convert CUDA to Portable C++ Code☆567Updated this week
- ☆106Updated 3 weeks ago
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆364Updated 2 weeks ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆322Updated this week
- CUDA Kernel Benchmarking Library☆596Updated 2 weeks ago
- Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research☆110Updated last year
- ☆409Updated this week
- oneCCL Bindings for Pytorch*☆91Updated this week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆129Updated last year
- IREE's PyTorch Frontend, based on Torch Dynamo.☆74Updated this week
- A library of GPU kernels for sparse matrix operations.☆259Updated 4 years ago
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆534Updated last week
- Unified Collective Communication Library☆242Updated this week
- Backward compatible ML compute opset inspired by HLO/MHLO☆457Updated this week
- Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysi…☆225Updated this week
- collection of benchmarks to measure basic GPU capabilities☆340Updated last month