HabanaAI / Habana_Custom_Kernel
Provides the examples to write and build Habana custom kernels using the HabanaTools
☆19Updated last month
Alternatives and similar repositories for Habana_Custom_Kernel:
Users that are interested in Habana_Custom_Kernel are comparing it to the libraries listed below
- SynapseAI Core is a reference implementation of the SynapseAI API running on Habana Gaudi☆38Updated 2 years ago
- ☆64Updated 2 months ago
- An extension library of WMMA API (Tensor Core API)☆87Updated 6 months ago
- Benchmark code for the "Online normalizer calculation for softmax" paper☆62Updated 6 years ago
- A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores☆47Updated last year
- CUDA Templates for Linear Algebra Subroutines☆11Updated this week
- ☆66Updated 3 weeks ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆58Updated last month
- Benchmarks to capture important workloads.☆29Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆37Updated 8 months ago
- ☆81Updated 8 months ago
- ☆72Updated 2 years ago
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆99Updated this week
- ☆59Updated last month
- ☆178Updated 6 months ago
- ☆131Updated this week
- ☆27Updated 3 weeks ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆36Updated 5 months ago
- ☆46Updated 5 years ago
- CUTLASS and CuTe Examples☆35Updated 2 weeks ago
- ☆23Updated 5 years ago
- ☆40Updated 4 years ago
- Assembler for NVIDIA Volta and Turing GPUs☆204Updated 3 years ago
- oneCCL Bindings for Pytorch*☆87Updated 2 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆47Updated this week
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections☆117Updated 2 years ago
- OpenAI Triton backend for Intel® GPUs☆154Updated this week
- An IR for efficiently simulating distributed ML computation.☆25Updated last year
- ☆48Updated 10 months ago
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆78Updated this week