microsoft / antares
Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.
☆468Updated this week
Alternatives and similar repositories for antares:
Users that are interested in antares are comparing it to the libraries listed below
- Stretching GPU performance for GEMMs and tensor contractions.☆235Updated last week
- AMD's graph optimization engine.☆215Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆383Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆318Updated this week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆130Updated last year
- A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.☆981Updated 7 months ago
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆376Updated 2 weeks ago
- Shared Middle-Layer for Triton Compilation☆246Updated last week
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆117Updated 2 years ago
- ☆192Updated 2 years ago
- Assembler for NVIDIA Volta and Turing GPUs☆217Updated 3 years ago
- Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure☆843Updated last week
- OpenAI Triton backend for Intel® GPUs☆182Updated this week
- Experimental projects related to TensorRT☆97Updated last week
- ☆410Updated last week
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆99Updated 2 months ago
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆91Updated this week
- Microsoft Collective Communication Library☆343Updated last year
- ☆106Updated 2 weeks ago
- CUDA Kernel Benchmarking Library☆621Updated this week
- Backward compatible ML compute opset inspired by HLO/MHLO☆467Updated this week
- oneAPI Collective Communications Library (oneCCL)☆232Updated 3 weeks ago
- A home for the final text of all TVM RFCs.☆102Updated 7 months ago
- collection of benchmarks to measure basic GPU capabilities☆364Updated 2 months ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆342Updated this week
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆134Updated last week
- ☆141Updated this week
- ☆251Updated this week
- HIPIFY: Convert CUDA to Portable C++ Code☆571Updated last week
- A model compilation solution for various hardware☆427Updated this week