passlab / CUDAMicroBenchLinks

☆42

Alternatives and similar repositories for CUDAMicroBench

Users that are interested in CUDAMicroBench are comparing it to the libraries listed below

Sorting:

sjfeng1999 / gpu-arch-microbenchmark
Dissecting NVIDIA GPU Architecture
☆103Updated 3 years ago
sunlex0717 / DissectingTensorCores
☆106Updated last year
daadaada / gas
☆45Updated 4 years ago
wmmae / wmma_extension
An extension library of WMMA API (Tensor Core API)
☆99Updated last year
NVlabs / NVBit
☆270Updated 2 months ago
ekondis / gpumembench
A GPU benchmark suite for assessing on-chip GPU memory bandwidth
☆106Updated 7 years ago
ROCm / rocMLIR
☆148Updated this week
accel-sim / gpu-app-collection
A repository where GPU applications are aggregated using a common build flow that supports multiple CUDA versions.
☆71Updated last week
daadaada / turingas
Assembler for NVIDIA Volta and Turing GPUs
☆226Updated 3 years ago
NMSU-PEARL / PPT-GPU
Performance Prediction Toolkit for GPUs
☆37Updated 3 years ago
mmperf / mmperf
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
☆134Updated last year
pku-liang / AMOS
Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators
☆114Updated 2 years ago
carlushuang / gcnasm
amdgpu example code in hip/asm
☆36Updated last week
ROCm / amd_matrix_instruction_calculator
A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators
☆110Updated 2 months ago
intel / xetla
☆62Updated 7 months ago
apuaaChen / EVT_AE
Artifacts of EVT ASPLOS'24
☆26Updated last year
codyjrivera / tsm2x-imp
Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA
☆35Updated 5 years ago
decodecudabinary / Decoding-CUDA-Binary
☆52Updated 5 years ago
PAA-NCIC / PPoPP2017_artifact
Third party assembler and GEMM library for NVIDIA Kepler GPU
☆81Updated 5 years ago
ParCIS / Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
☆89Updated 2 years ago
apache / tvm-rfcs
A home for the final text of all TVM RFCs.
☆105Updated 10 months ago
c3sr / tcu_scope
☆51Updated 6 years ago
shen203 / GPU_Microbenchmark
☆23Updated 3 years ago
ROCm / rocSHMEM
rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
☆98Updated this week
FdyCN / PTX-ISA
CUDA PTX-ISA Document 中文翻译版
☆45Updated 2 months ago
uuudown / Tartan
Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite
☆65Updated 6 years ago
0xD0GF00D / DocumentSASS
Unofficial description of the CUDA assembly (SASS) instruction sets.
☆132Updated 3 weeks ago
Lewuathe / mlir-hello
MLIR Sample dialect
☆124Updated 5 months ago
nod-ai / iree-amd-aie
IREE plugin repository for the AMD AIE accelerator
☆100Updated this week
nicolaswilde / cuda-tensorcore-hgemm
☆149Updated 7 months ago