libxsmm / tpp-pytorch-extensionLinks
Intel® Tensor Processing Primitives extension for Pytorch*
☆17Updated last week
Alternatives and similar repositories for tpp-pytorch-extension
Users that are interested in tpp-pytorch-extension are comparing it to the libraries listed below
Sorting:
- ☆61Updated last year
- collection of benchmarks to measure basic GPU capabilities☆476Updated 2 months ago
- OpenAI Triton backend for Intel® GPUs☆222Updated this week
- SYCL* Templates for Linear Algebra (SYCL*TLA) - SYCL based CUTLASS implementation for Intel GPUs☆59Updated this week
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆35Updated 5 years ago
- ☆110Updated last year
- Assembler for NVIDIA Volta and Turing GPUs☆235Updated 3 years ago
- ☆165Updated 7 months ago
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆499Updated last week
- Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO. I AM NOT ONE OF THE AUTHORS OF THE PAPER.☆47Updated last month
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆138Updated this week
- Dissecting NVIDIA GPU Architecture☆115Updated 3 years ago
- PArallelLOOPgEneratoR: Threaded Loops Code Generation Infrastructure targeting Tensor Contraction Applications such as GEMMs, Convolution…☆19Updated last month
- Development repository for the Triton-Linalg conversion☆209Updated 10 months ago
- ☆83Updated 3 years ago
- Shared Middle-Layer for Triton Compilation☆321Updated 2 weeks ago
- Instructions, Docker images, and examples for Nsight Compute and Nsight Systems☆134Updated 5 years ago
- ☆34Updated last year
- ☆20Updated 2 years ago
- ☆156Updated last year
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆90Updated 3 years ago
- ☆50Updated 6 years ago
- ☆291Updated 3 months ago
- Microsoft Collective Communication Library☆377Updated 2 years ago
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆285Updated 4 months ago
- Yinghan's Code Sample☆361Updated 3 years ago
- ☆167Updated last year
- ☆24Updated 3 years ago
- oneAPI Collective Communications Library (oneCCL)☆252Updated last week
- ☆157Updated last month