libxsmm / tpp-pytorch-extensionLinks
Intel® Tensor Processing Primitives extension for Pytorch*
☆17Updated 3 weeks ago
Alternatives and similar repositories for tpp-pytorch-extension
Users that are interested in tpp-pytorch-extension are comparing it to the libraries listed below
Sorting:
- ☆62Updated 8 months ago
- collection of benchmarks to measure basic GPU capabilities☆411Updated 6 months ago
- ☆106Updated last year
- OpenAI Triton backend for Intel® GPUs☆205Updated this week
- Dissecting NVIDIA GPU Architecture☆104Updated 3 years ago
- A CUTLASS implementation using SYCL☆35Updated this week
- ☆136Updated 3 months ago
- Assembler for NVIDIA Volta and Turing GPUs☆229Updated 3 years ago
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆452Updated this week
- ☆31Updated last year
- PArallelLOOPgEneratoR: Threaded Loops Code Generation Infrastructure targeting Tensor Contraction Applications such as GEMMs, Convolution…☆19Updated 3 months ago
- A home for the final text of all TVM RFCs.☆106Updated 11 months ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆35Updated 5 years ago
- Shared Middle-Layer for Triton Compilation☆275Updated this week
- oneCCL Bindings for Pytorch*☆101Updated 3 weeks ago
- Development repository for the Triton-Linalg conversion☆194Updated 6 months ago
- ☆151Updated 8 months ago
- ☆50Updated 6 years ago
- ☆81Updated 2 years ago
- ☆271Updated 2 months ago
- ☆147Updated this week
- ☆131Updated 8 months ago
- CUDA PTX-ISA Document 中文翻译版☆44Updated 3 months ago
- Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators☆115Updated 2 years ago
- Microsoft Collective Communication Library☆359Updated last year
- Code samples related to Intel(R) AMX☆39Updated last year
- An extension library of WMMA API (Tensor Core API)☆103Updated last year
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆89Updated 2 years ago
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆100Updated this week
- ☆45Updated 4 years ago