HabanaAI / gaudi-pytorch-bridgeLinks
☆17Updated last month
Alternatives and similar repositories for gaudi-pytorch-bridge
Users that are interested in gaudi-pytorch-bridge are comparing it to the libraries listed below
Sorting:
- SYCL* Templates for Linear Algebra (SYCL*TLA) - SYCL based CUTLASS implementation for Intel GPUs☆41Updated this week
- ☆150Updated 5 months ago
- ☆241Updated last year
- ☆141Updated 9 months ago
- A lightweight design for computation-communication overlap.☆181Updated 2 weeks ago
- Provides the examples to write and build Habana custom kernels using the HabanaTools☆23Updated 6 months ago
- ☆137Updated 10 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆83Updated this week
- A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.☆60Updated this week
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆89Updated 2 years ago
- ☆40Updated 7 months ago
- ☆107Updated 5 months ago
- ☆50Updated 6 years ago
- OpenAI Triton backend for Intel® GPUs☆212Updated this week
- ☆32Updated 3 years ago
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆67Updated last year
- ☆109Updated last year
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆139Updated last month
- Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO. I AM NOT ONE OF THE AUTHORS OF THE PAPER.☆36Updated last year
- Optimize GEMM with tensorcore step by step☆32Updated last year
- ☆153Updated last year
- Shared Middle-Layer for Triton Compilation☆292Updated 2 weeks ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆63Updated 3 months ago
- Github mirror of trition-lang/triton repo.☆86Updated last week
- ☆61Updated 4 months ago
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆221Updated 2 years ago
- An extension library of WMMA API (Tensor Core API)☆106Updated last year
- Thunder Research Group's Collective Communication Library☆42Updated 3 months ago
- ☆83Updated 2 years ago
- A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores☆53Updated last year