HabanaAI / gaudi-pytorch-bridgeLinks
☆17Updated last month
Alternatives and similar repositories for gaudi-pytorch-bridge
Users that are interested in gaudi-pytorch-bridge are comparing it to the libraries listed below
Sorting:
- Github mirror of trition-lang/triton repo.☆78Updated last week
- ☆144Updated 4 months ago
- ☆121Updated 9 months ago
- Provides the examples to write and build Habana custom kernels using the HabanaTools☆22Updated 5 months ago
- SYCL based CUTLASS implementation for Intel GPUs☆40Updated this week
- Optimize GEMM with tensorcore step by step☆32Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆83Updated this week
- OpenAI Triton backend for Intel® GPUs☆210Updated this week
- ☆108Updated last year
- A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.☆60Updated last month
- ☆83Updated 2 years ago
- ☆238Updated last year
- ☆50Updated 6 years ago
- ☆135Updated 9 months ago
- An extension library of WMMA API (Tensor Core API)☆106Updated last year
- ☆32Updated 3 years ago
- ☆57Updated 4 months ago
- ☆151Updated last year
- Thunder Research Group's Collective Communication Library☆42Updated 2 months ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆143Updated 5 years ago
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆221Updated 2 years ago
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆67Updated last year
- ☆90Updated 10 months ago
- ☆43Updated last year
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆132Updated 2 weeks ago
- A lightweight design for computation-communication overlap.☆177Updated 2 weeks ago
- ☆106Updated 4 months ago
- Distributed MoE in a Single Kernel [NeurIPS '25]☆49Updated this week
- An experimental CPU backend for Triton☆154Updated 4 months ago
- ☆53Updated last year