RRZE-HPC/gpu-benches

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/RRZE-HPC/gpu-benches)

RRZE-HPC / gpu-benches

collection of benchmarks to measure basic GPU capabilities

☆530

Alternatives and similar repositories for gpu-benches

Users that are interested in gpu-benches are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sunlex0717 / DissectingTensorCores
View on GitHub
☆114Apr 19, 2024Updated 2 years ago
sjfeng1999 / gpu-arch-microbenchmark
View on GitHub
Dissecting NVIDIA GPU Architecture
☆126Jul 11, 2022Updated 4 years ago
NVIDIA / nvbench
View on GitHub
CUDA Kernel Benchmarking Library
☆910Updated this week
cloudcores / CuAssembler
View on GitHub
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆609Apr 20, 2023Updated 3 years ago
passlab / CUDAMicroBench
View on GitHub
☆53Jun 24, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
KnowingNothing / MatmulTutorial
View on GitHub
A Easy-to-understand TensorOp Matmul Tutorial
☆445Mar 5, 2026Updated 4 months ago
ekondis / mixbench
View on GitHub
A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)
☆463Jul 12, 2026Updated last week
NVIDIA / multi-gpu-programming-models
View on GitHub
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
☆909Sep 26, 2025Updated 9 months ago
Bruce-Lee-LY / cuda_hgemm
View on GitHub
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…
☆557Sep 8, 2024Updated last year
Yinghan-Li / YHs_Sample
View on GitHub
Yinghan's Code Sample
☆365Jul 25, 2022Updated 3 years ago
accel-sim / accel-sim-framework
View on GitHub
This is the top-level repository for the Accel-Sim framework.
☆630Mar 24, 2026Updated 4 months ago
yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
View on GitHub
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
☆420Jan 2, 2025Updated last year
daadaada / turingas
View on GitHub
Assembler for NVIDIA Volta and Turing GPUs
☆246Jan 13, 2022Updated 4 years ago
ColfaxResearch / cfx-article-src
View on GitHub
☆193May 7, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
ColfaxResearch / cutlass-kernels
View on GitHub
☆269Jul 11, 2024Updated 2 years ago
TiledTensor / TiledCUDA
View on GitHub
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆192Jan 28, 2025Updated last year
0xD0GF00D / DocumentSASS
View on GitHub
Unofficial description of the CUDA assembly (SASS) instruction sets.
☆224Jul 18, 2025Updated last year
HazyResearch / ThunderKittens
View on GitHub
Tile primitives for speedy kernels
☆3,563Jul 13, 2026Updated last week
microsoft / BitBLAS
View on GitHub
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
☆769Aug 6, 2025Updated 11 months ago
reed-lau / cute-gemm
View on GitHub
☆188May 11, 2026Updated 2 months ago
gpgpu-sim / gpgpu-sim_distribution
View on GitHub
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for…
☆1,676Feb 15, 2025Updated last year
CalebDu / Awesome-Cute
View on GitHub
☆121May 16, 2025Updated last year
flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
FZJ-JSC / tutorial-multi-gpu
View on GitHub
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
☆380Jun 26, 2026Updated 3 weeks ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,498Updated this week
Jokeren / Awesome-GPU
View on GitHub
Awesome resources for GPUs
☆635Mar 10, 2026Updated 4 months ago
hibagus / CUDA_Bench
View on GitHub
CUDA GPU Benchmark
☆38Jan 31, 2025Updated last year
SemiAnalysisAI / microbench-blackwell
View on GitHub
☆124May 10, 2026Updated 2 months ago
NVIDIA / nvbandwidth
View on GitHub
A tool for bandwidth measurements on NVIDIA GPUs.
☆737Updated this week
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,123Updated this week
BBuf / how-to-optim-algorithm-in-cuda
View on GitHub
how to optimize some algorithm in cuda.
☆3,146Updated this week
leepoly / sm-profiler
View on GitHub
☆83Feb 5, 2026Updated 5 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
nicolaswilde / cuda-tensorcore-hgemm
View on GitHub
☆160Dec 26, 2024Updated last year
aikitoria / nanotrace
View on GitHub
Low overhead tracing library and trace visualizer for pipelined CUDA kernels
☆137Jul 17, 2026Updated last week
HPMLL / NVIDIA-Hopper-Benchmark
View on GitHub
☆116May 31, 2025Updated last year
ademeure / DeeperGEMM
View on GitHub
DeeperGEMM: crazy optimized version
☆86May 5, 2025Updated last year
Cjkkkk / CUDA_gemm
View on GitHub
A simple high performance CUDA GEMM implementation.
☆437Jan 4, 2024Updated 2 years ago
ademeure / cuda-side-boost
View on GitHub
☆60Feb 24, 2026Updated 5 months ago
xlite-dev / HGEMM
View on GitHub
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
☆157May 10, 2025Updated last year