microsoft / antares
Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.
☆449Updated this week
Related projects ⓘ
Alternatives and complementary repositories for antares
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆315Updated this week
- AMD's graph optimization engine.☆187Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆271Updated this week
- Stretching GPU performance for GEMMs and tensor contractions.☆223Updated this week
- Assembler for NVIDIA Volta and Turing GPUs☆202Updated 2 years ago
- ☆398Updated this week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆124Updated last year
- Shared Middle-Layer for Triton Compilation☆192Updated this week
- collection of benchmarks to measure basic GPU capabilities☆264Updated 5 months ago
- ☆59Updated this week
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆250Updated this week
- Backward compatible ML compute opset inspired by HLO/MHLO☆412Updated this week
- CUDA Kernel Benchmarking Library☆519Updated this week
- ROCm Communication Collectives Library (RCCL)☆270Updated this week
- Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure☆769Updated this week
- Microsoft Collective Communication Library☆322Updated last year
- A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.☆963Updated 2 months ago
- HIPIFY: Convert CUDA to Portable C++ Code☆523Updated this week
- OpenAI Triton backend for Intel® GPUs☆143Updated this week
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆124Updated this week
- A GPU-driven system framework for scalable AI applications☆109Updated last month
- The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.☆1,356Updated this week
- ☆129Updated this week
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆95Updated this week
- ☆224Updated 2 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆147Updated 2 months ago
- portDNN is a library implementing neural network algorithms written using SYCL☆108Updated 6 months ago
- Next generation BLAS implementation for ROCm platform☆346Updated this week
- Intel® Extension for TensorFlow*☆320Updated last month
- PyTorch emulation library for Microscaling (MX)-compatible data formats☆164Updated 2 months ago