libxsmm / tpp-pytorch-extension
Intel® Tensor Processing Primitives extension for Pytorch*
☆10Updated last week
Alternatives and similar repositories for tpp-pytorch-extension:
Users that are interested in tpp-pytorch-extension are comparing it to the libraries listed below
- ☆60Updated 2 months ago
- ☆20Updated last year
- PArallelLOOPgEneratoR: Threaded Loops Code Generation Infrastructure targeting Tensor Contraction Applications such as GEMMs, Convolution…☆18Updated 2 months ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆60Updated 2 months ago
- oneCCL Bindings for Pytorch*☆88Updated last month
- oneAPI Collective Communications Library (oneCCL)☆222Updated 3 weeks ago
- collection of benchmarks to measure basic GPU capabilities☆296Updated last week
- OpenAI Triton backend for Intel® GPUs☆165Updated this week
- Performance Prediction Toolkit for GPUs☆35Updated 2 years ago
- Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysi…☆217Updated this week
- ☆26Updated 10 months ago
- ☆87Updated 10 months ago
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆85Updated 2 years ago
- Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite☆64Updated 6 years ago
- Dissecting NVIDIA GPU Architecture☆88Updated 2 years ago
- ☆75Updated 2 years ago
- ☆47Updated 5 years ago
- A direct convolution library targeting ARM multi-core CPUs.☆12Updated 2 months ago
- Assembler for NVIDIA Volta and Turing GPUs☆212Updated 3 years ago
- ROCm Communication Collectives Library (RCCL)☆297Updated this week
- ☆19Updated this week
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆73Updated last year
- Development repository for the Triton-Linalg conversion☆173Updated 2 weeks ago
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆345Updated 5 months ago
- A benchmarking suite for heterogeneous systems. The primary goal of this project is to improve and update aspects of existing benchmarkin…☆40Updated 11 months ago
- A home for the final text of all TVM RFCs.☆102Updated 4 months ago
- An extension library of WMMA API (Tensor Core API)☆88Updated 7 months ago
- A Row Decomposition-based Approach for Sparse Matrix Multiplication on GPUs☆17Updated last year
- Synthesizer for optimal collective communication algorithms☆103Updated 10 months ago
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆349Updated this week