Jokeren/GPA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Jokeren/GPA)

Jokeren / GPA

GPU Performance Advisor

☆66

Alternatives and similar repositories for GPA

Users that are interested in GPA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

GVProf / GVProf
View on GitHub
GVProf: A Value Profiler for GPU-based Clusters
☆54Mar 24, 2024Updated 2 years ago
aoli-al / HFuse
View on GitHub
Horizontal Fusion
☆24Jan 7, 2022Updated 4 years ago
Lin-Mao / DrGPUM
View on GitHub
A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.
☆36May 30, 2026Updated last month
getianao / ngAP
View on GitHub
ngAP's artifact for ASPLOS'24
☆25Jul 29, 2025Updated 11 months ago
reger-men / HPL_GPU
View on GitHub
High-Performance Linpack Benchmark adopted version for GPU backend
☆12Sep 12, 2022Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
FindHao / drgpu
View on GitHub
A Top-Down Profiler for GPU Applications
☆23Feb 29, 2024Updated 2 years ago
JohndeVostok / APE
View on GitHub
A GPU FP32 computation method with Tensor Cores.
☆27Dec 8, 2025Updated 7 months ago
HAWAIILAB / cuda-flux
View on GitHub
CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels
☆33Mar 15, 2021Updated 5 years ago
HPCToolkit / hpctoolkit-tutorial-examples
View on GitHub
CPU and GPU tutorial examples
☆13Apr 4, 2025Updated last year
sjfeng1999 / gpu-arch-microbenchmark
View on GitHub
Dissecting NVIDIA GPU Architecture
☆126Jul 11, 2022Updated 4 years ago
casys-kaist / HUVM
View on GitHub
☆27Aug 19, 2022Updated 3 years ago
tile-ai / AttentionEngine
View on GitHub
☆52May 19, 2025Updated last year
illinois-impact / klap
View on GitHub
A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches
☆15Jun 21, 2019Updated 7 years ago
AccelProf / AccelProf
View on GitHub
A modular program analysis tool framework for accelerators (NVIDIA, AMD, and DL workloads).
☆24Jul 5, 2026Updated 2 weeks ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Jokeren / Awesome-GPU
View on GitHub
Awesome resources for GPUs
☆635Mar 10, 2026Updated 4 months ago
DebashisGanguly / gpgpu-sim_UVMSmart
View on GitHub
☆83Nov 16, 2020Updated 5 years ago
wahibium / KFF
View on GitHub
Scalable GPU Kernel Fission/Fusion Transformation for Memory-Bound Kernels
☆14Aug 26, 2015Updated 10 years ago
pyxis-roc / ptxparser
View on GitHub
A parser for PTX 6.5
☆13Jun 19, 2023Updated 3 years ago
cowanmeg / cgo-artifact-2020
View on GitHub
Artifact repository for paper Automatic Generation of High-Performance Quantized Machine Learning Kernels
☆17Oct 13, 2020Updated 5 years ago
chhzh123 / ptc-tutorial
View on GitHub
PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo
☆17Mar 13, 2023Updated 3 years ago
NVlabs / NVBit
View on GitHub
☆341Apr 6, 2026Updated 3 months ago
Stefan20162016 / maxas-explained
View on GitHub
maxas Scott Grey's maxas assembler sgemm explaining the (for me) missing parts https://github.com/NervanaSystems/maxas
☆17Dec 22, 2018Updated 7 years ago
OSU-STARLAB / UVM_benchmark
View on GitHub
☆34Sep 9, 2020Updated 5 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
ROCm / TransferBench
View on GitHub
TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)
☆74Updated this week
ariasanovsky / ptx-parser
View on GitHub
☆11Jun 9, 2023Updated 3 years ago
accel-sim / gpu-app-collection
View on GitHub
A repository where GPU applications are aggregated using a common build flow that supports multiple CUDA versions.
☆93Apr 14, 2026Updated 3 months ago
cloudcores / CuAssembler
View on GitHub
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆609Apr 20, 2023Updated 3 years ago
thu-pacman / HyQuas
View on GitHub
A hybrid partitioner based quantum circuit simulation system on GPU
☆46Aug 17, 2022Updated 3 years ago
flagos-ai / libtriton_jit
View on GitHub
A Triton JIT runtime and ffi provider in C++
☆37Updated this week
csl-iisc / GPM-ASPLOS22
View on GitHub
☆36Jun 10, 2024Updated 2 years ago
yixiaoer / tpu-training-example
View on GitHub
☆16Jul 8, 2024Updated 2 years ago
e-ago / hpgmg-cuda-async
View on GitHub
GPUDirect Async implementation of HPGMG-FV CUDA
☆11May 11, 2018Updated 8 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
chhzh123 / Krill
View on GitHub
An efficient concurrent graph processing system
☆46Oct 27, 2021Updated 4 years ago
IntelligentSoftwareSystems / GaloisGPU
View on GitHub
LonestarGPU: Irregular algorithms parallelized for GPUs
☆38Nov 11, 2019Updated 6 years ago
eth-cscs / ext_mpi_collectives
View on GitHub
ext_mpi_collectives
☆11Jun 3, 2026Updated last month
tissue3 / EyerissSimulator
View on GitHub
Eyeriss chip simulator
☆41Mar 6, 2020Updated 6 years ago
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
OpenPPL / CuAssembler
View on GitHub
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆85Mar 20, 2023Updated 3 years ago
Sike-Wang / low-bit-Shampoo
View on GitHub
4-bit Shampoo for Memory-Efficient Network Training (NeurIPS 2024)
☆13Feb 13, 2025Updated last year