microsoft / antaresLinks
Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.
☆467Updated 9 months ago
Alternatives and similar repositories for antares
Users that are interested in antares are comparing it to the libraries listed below
Sorting:
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆254Updated last week
- AMD's graph optimization engine.☆275Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆380Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror☆518Updated this week
- OpenAI Triton backend for Intel® GPUs☆226Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆390Updated last week
- ☆137Updated last week
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆448Updated this week
- ☆281Updated this week
- oneAPI Collective Communications Library (oneCCL)☆254Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆148Updated last week
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆507Updated last week
- ☆422Updated last month
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆457Updated last week
- ☆304Updated this week
- Shared Middle-Layer for Triton Compilation☆325Updated 2 months ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆138Updated 2 years ago
- oneCCL Bindings for Pytorch* (deprecated)☆104Updated last month
- ☆61Updated last year
- Assembler for NVIDIA Volta and Turing GPUs☆238Updated 4 years ago
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆410Updated this week
- HIPIFY: Convert CUDA to Portable C++ Code☆653Updated this week
- collection of benchmarks to measure basic GPU capabilities☆492Updated 3 months ago
- Development repository for the Triton language and compiler☆140Updated last week
- Experimental projects related to TensorRT☆118Updated last week
- ☆165Updated last week
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆681Updated this week
- Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloade…☆613Updated last year
- GPUOcelot: A dynamic compilation framework for PTX☆219Updated last year
- This is the AMD-maintained fork of the LLVM git repository. This repository accepts pull requests and issues related to AMD fork-specific…☆201Updated this week