microsoft / antares
Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.
☆435Updated last week
Related projects: ⓘ
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆293Updated this week
- AMD's graph optimization engine.☆183Updated this week
- Stretching GPU performance for GEMMs and tensor contractions.☆213Updated this week
- HIPIFY: Convert CUDA to Portable C++ Code☆499Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆250Updated this week
- Assembler for NVIDIA Volta and Turing GPUs☆195Updated 2 years ago
- A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.☆952Updated this week
- Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure☆743Updated this week
- ☆392Updated last week
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆279Updated last week
- CUDA Kernel Benchmarking Library☆482Updated 3 months ago
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆90Updated this week
- ROCm Communication Collectives Library (RCCL)☆251Updated this week
- A collection of examples for the ROCm software stack☆149Updated this week
- OpenAI Triton backend for Intel® GPUs☆126Updated this week
- ☆221Updated this week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆123Updated 11 months ago
- Shared Middle-Layer for Triton Compilation☆160Updated this week
- collection of benchmarks to measure basic GPU capabilities☆241Updated 3 months ago
- ☆80Updated 4 months ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆389Updated last year
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆353Updated last month
- ☆193Updated last year
- ☆124Updated this week
- ☆53Updated last week
- GPUOcelot: A dynamic compilation framework for PTX☆136Updated 3 months ago
- chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.☆185Updated this week
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆116Updated this week
- The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.☆1,301Updated this week
- A profiler to disclose and quantify hardware features on GPUs.☆158Updated 2 years ago