microsoft / antares
Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.
☆452Updated last week
Alternatives and similar repositories for antares:
Users that are interested in antares are comparing it to the libraries listed below
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆99Updated last week
- Shared Middle-Layer for Triton Compilation☆220Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆334Updated this week
- AMD's graph optimization engine.☆196Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆292Updated this week
- Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure☆796Updated this week
- CUDA Kernel Benchmarking Library☆547Updated 2 months ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆127Updated last year
- OpenAI Triton backend for Intel® GPUs☆154Updated this week
- ROCm Communication Collectives Library (RCCL)☆290Updated this week
- HIPIFY: Convert CUDA to Portable C++ Code☆537Updated this week
- Assembler for NVIDIA Volta and Turing GPUs☆204Updated 3 years ago
- Stretching GPU performance for GEMMs and tensor contractions.☆231Updated this week
- Backward compatible ML compute opset inspired by HLO/MHLO☆428Updated this week
- A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.☆972Updated 4 months ago
- ☆402Updated this week
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆415Updated last year
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆286Updated this week
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆331Updated last month
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆127Updated this week
- A model compilation solution for various hardware☆394Updated 3 weeks ago
- GPUOcelot: A dynamic compilation framework for PTX☆157Updated 3 weeks ago
- ☆133Updated this week
- Microsoft Collective Communication Library☆330Updated last year
- collection of benchmarks to measure basic GPU capabilities☆282Updated 2 weeks ago
- IREE's PyTorch Frontend, based on Torch Dynamo.☆60Updated this week
- oneAPI Collective Communications Library (oneCCL)☆217Updated last week
- The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.☆1,405Updated this week
- Experimental projects related to TensorRT☆86Updated this week
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆372Updated last week