HabanaAI / gaudi-pytorch-bridgeLinks
☆17Updated this week
Alternatives and similar repositories for gaudi-pytorch-bridge
Users that are interested in gaudi-pytorch-bridge are comparing it to the libraries listed below
Sorting:
- A CUTLASS implementation using SYCL☆32Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆78Updated this week
- ☆127Updated 2 months ago
- ☆102Updated 7 months ago
- ☆106Updated last year
- A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.☆56Updated last week
- A lightweight design for computation-communication overlap.☆155Updated last month
- Optimize GEMM with tensorcore step by step☆31Updated last year
- ☆227Updated last year
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆61Updated last month
- Github mirror of trition-lang/triton repo.☆48Updated this week
- OpenAI Triton backend for Intel® GPUs☆197Updated this week
- ☆80Updated 2 years ago
- ☆62Updated 7 months ago
- ☆51Updated 6 years ago
- Provides the examples to write and build Habana custom kernels using the HabanaTools☆22Updated 3 months ago
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆89Updated 2 years ago
- ☆89Updated 2 months ago
- Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO☆31Updated last year
- ☆32Updated 2 years ago
- ☆41Updated this week
- An extension library of WMMA API (Tensor Core API)☆99Updated last year
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆216Updated last year
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆35Updated 5 years ago
- LLM Inference analyzer for different hardware platforms☆82Updated 3 weeks ago
- Artifacts of EVT ASPLOS'24☆26Updated last year
- ☆96Updated 10 months ago
- ☆128Updated 8 months ago
- nnScaler: Compiling DNN models for Parallel Training☆114Updated this week
- ☆30Updated 4 months ago