codeplaysoftware / cutlass-syclLinks

A CUTLASS implementation using SYCL

☆31

Alternatives and similar repositories for cutlass-sycl

Users that are interested in cutlass-sycl are comparing it to the libraries listed below

Sorting:

intel / xetla
☆62Updated 7 months ago
ROCm / rocSHMEM
rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
☆92Updated this week
ROCm / amd_matrix_instruction_calculator
A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators
☆107Updated last month
wmmae / wmma_extension
An extension library of WMMA API (Tensor Core API)
☆99Updated last year
daadaada / turingas
Assembler for NVIDIA Volta and Turing GPUs
☆224Updated 3 years ago
intel / intel-xpu-backend-for-triton
OpenAI Triton backend for Intel® GPUs
☆193Updated this week
intel / torch-xpu-ops
☆48Updated this week
sunlex0717 / DissectingTensorCores
☆104Updated last year
sjfeng1999 / gpu-arch-microbenchmark
Dissecting NVIDIA GPU Architecture
☆101Updated 3 years ago
north-numerical-computing / tensor-cores-numerical-behavior
Test suite for probing the numerical behavior of NVIDIA tensor cores
☆40Updated 11 months ago
ROCm / rocWMMA
rocWMMA
☆119Updated this week
ROCm / rocMLIR
☆148Updated this week
intel / intel-extension-for-deepspeed
Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…
☆61Updated 2 weeks ago
ROCm / aotriton
Ahead of Time (AOT) Triton Math Library
☆70Updated last week
ROCm / rocmProfileData
☆25Updated 3 weeks ago
merthidayetoglu / HiCCL
A hierarchical collective communications library with portable optimizations
☆35Updated 7 months ago
ROCm / rocprofiler-compute
Advanced Profiling and Analytics for AMD Hardware
☆159Updated this week
HabanaAI / Habana_Custom_Kernel
Provides the examples to write and build Habana custom kernels using the HabanaTools
☆22Updated 3 months ago
ROCm / rccl-tests
RCCL Performance Benchmark Tests
☆70Updated this week
ekondis / gpumembench
A GPU benchmark suite for assessing on-chip GPU memory bandwidth
☆106Updated 7 years ago
pytorch-labs / triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)
☆43Updated 4 months ago
ROCm / rocprofiler
ROC profiler library. Profiling with perf-counters and derived metrics.
☆150Updated this week
c3sr / tcu_scope
☆51Updated 6 years ago
intel / torch-ccl
oneCCL Bindings for Pytorch*
☆99Updated last week
Jokeren / GPA
GPU Performance Advisor
☆65Updated 2 years ago
codyjrivera / tsm2x-imp
Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA
☆33Updated 4 years ago
temporal-hpc / reduction-tensor-cores
Fast GPU based tensor core reductions
☆13Updated 2 years ago
carlushuang / gcnasm
amdgpu example code in hip/asm
☆35Updated last month
shixun404 / Fault-Tolerant-SGEMM-on-NVIDIA-GPUs
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs
☆12Updated 3 months ago
oneapi-src / level-zero-spec
☆20Updated 2 months ago