HabanaAI / gaudi-pytorch-bridgeLinks

☆17

Alternatives and similar repositories for gaudi-pytorch-bridge

Users that are interested in gaudi-pytorch-bridge are comparing it to the libraries listed below

Sorting:

intel / sycl-tla
SYCL* Templates for Linear Algebra (SYCL*TLA) - SYCL based CUTLASS implementation for Intel GPUs
☆41Updated this week
ColfaxResearch / cfx-article-src
☆150Updated 5 months ago
ColfaxResearch / cutlass-kernels
☆241Updated last year
yifuwang / symm-mem-recipes
☆141Updated 9 months ago
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆181Updated 2 weeks ago
HabanaAI / Habana_Custom_Kernel
Provides the examples to write and build Habana custom kernels using the HabanaTools
☆23Updated 6 months ago
reed-lau / cute-gemm
☆137Updated 10 months ago
HabanaAI / vllm-fork
A high-throughput and memory-efficient inference and serving engine for LLMs
☆83Updated this week
DD-DuDa / BitLadder
A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.
☆60Updated this week
ParCIS / Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
☆89Updated 2 years ago
Cambricon / torch_mlu
☆40Updated 7 months ago
CalebDu / Awesome-Cute
☆107Updated 5 months ago
c3sr / tcu_scope
☆50Updated 6 years ago
intel / intel-xpu-backend-for-triton
OpenAI Triton backend for Intel® GPUs
☆212Updated this week
apuaaChen / vectorSparse
☆32Updated 3 years ago
Bruce-Lee-LY / cuda_hgemv
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
☆67Updated last year
sunlex0717 / DissectingTensorCores
☆109Updated last year
KuangjuX / NVSHMEM-Tutorial
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆139Updated last month
atomicapple0 / libsmctrl
Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO. I AM NOT ONE OF THE AUTHORS OF THE PAPER.
☆36Updated last year
gty111 / GEMM_MMA
Optimize GEMM with tensorcore step by step
☆32Updated last year
microsoft / SparTA
☆153Updated last year
microsoft / triton-shared
Shared Middle-Layer for Triton Compilation
☆292Updated 2 weeks ago
intel / intel-extension-for-deepspeed
Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…
☆63Updated 3 months ago
facebookexperimental / triton
Github mirror of trition-lang/triton repo.
☆86Updated last week
HPMLL / NVIDIA-Hopper-Benchmark
☆61Updated 4 months ago
AlibabaResearch / flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆221Updated 2 years ago
wmmae / wmma_extension
An extension library of WMMA API (Tensor Core API)
☆106Updated last year
mcrl / tccl
Thunder Research Group's Collective Communication Library
☆42Updated 3 months ago
parasailteam / coconet
☆83Updated 2 years ago
UDC-GAC / venom
A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
☆53Updated last year