FindHao/drgpu

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/FindHao/drgpu)

FindHao / drgpu

A Top-Down Profiler for GPU Applications

☆23

Alternatives and similar repositories for drgpu

Users that are interested in drgpu are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ProjectPhysX / PTXprofiler
View on GitHub
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
☆59Mar 20, 2025Updated last year
sderek / CUDAAdvisor
View on GitHub
CUDAAdvisor: a GPU profiling tool
☆53Aug 24, 2018Updated 7 years ago
xjdr-alt / mla_blog_translation
View on GitHub
☆13Jun 18, 2024Updated 2 years ago
daniel-geon-park / triton_bwd
View on GitHub
Automatic differentiation for Triton Kernels
☆29Aug 12, 2025Updated 11 months ago
dorsal-lab / hip-analyzer
View on GitHub
Compiler plugin for performance analysis of HIP applications
☆14Jul 1, 2026Updated 2 weeks ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated last year
yester31 / Cutlass_EX
View on GitHub
study of cutlass
☆22Nov 10, 2024Updated last year
Kernel-Machines / kermac
View on GitHub
Pytorch routines for (Ker)nel (Mac)hines
☆12Oct 10, 2025Updated 9 months ago
GVProf / GVProf
View on GitHub
GVProf: A Value Profiler for GPU-based Clusters
☆54Mar 24, 2024Updated 2 years ago
llnl / pLiner
View on GitHub
pLiner is a framework that helps programmers identify locations in the source of numerical code that are highly affected by compiler opti…
☆17Oct 27, 2023Updated 2 years ago
Xuhpclab / jxperf
View on GitHub
☆11Jan 4, 2022Updated 4 years ago
Jokeren / GPA
View on GitHub
GPU Performance Advisor
☆66Jul 25, 2022Updated 3 years ago
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
Qiong-WU / ARRR_code
View on GitHub
☆29Oct 22, 2020Updated 5 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
MoZeWei / moTuner
View on GitHub
☆10May 12, 2022Updated 4 years ago
KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆34Jul 29, 2025Updated 11 months ago
cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
feifeibear / DPSKV3MFU
View on GitHub
Estimate MFU for DeepSeekV3
☆26Jan 5, 2025Updated last year
hpcgroup / TraceR
View on GitHub
Trace Replay and Network Simulation Framework
☆21Apr 14, 2021Updated 5 years ago
vortexgpgpu / Volt
View on GitHub
☆17Feb 9, 2026Updated 5 months ago
hbiyik / hw_necromancer
View on GitHub
☆10Jun 6, 2026Updated last month
microsoft / TileFusion
View on GitHub
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆115Jun 28, 2025Updated last year
regehr / pldi22-llvm-tutorial
View on GitHub
outline and links for PLDI 2022 tutorial
☆17Jun 13, 2022Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ROCm / HCC-Example-Application
View on GitHub
HCC Sample Applications
☆13Jan 3, 2017Updated 9 years ago
daajoe / GPUSAT
View on GitHub
☆12Sep 29, 2021Updated 4 years ago
flagos-ai / libtriton_jit
View on GitHub
A Triton JIT runtime and ffi provider in C++
☆37Updated this week
facebookexperimental / CUTracer
View on GitHub
A dynamic binary instrumentation tool for tracing and analyzing CUDA kernel instructions.
☆72Updated this week
eunomia-bpf / cupti-tutorial
View on GitHub
Tutorials for NVIDIA CUPTI samples
☆70Updated this week
sgl-project / DeepGEMM
View on GitHub
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
☆32Updated this week
sabit / libcprops
View on GitHub
cprops - C Prototyping Tools
☆12Jul 27, 2012Updated 13 years ago
yixiaoer / tpu-training-example
View on GitHub
☆16Jul 8, 2024Updated 2 years ago
AnonymousYWL / MYLIB
View on GitHub
☆18Apr 8, 2022Updated 4 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
brandtbucher / brandtbucher
View on GitHub
Gary Brandt Bucher, II
☆14Oct 22, 2025Updated 8 months ago
IaroslavElistratov / triton-autodiff
View on GitHub
☆19Nov 11, 2025Updated 8 months ago
intel / CacheLib
View on GitHub
Pluggable in-process caching engine to build and scale high performance services
☆19Updated this week
b0nes164 / Decoupled-Fallback-Paper
View on GitHub
☆19Mar 28, 2026Updated 3 months ago
NTT123 / cute-viz
View on GitHub
Cute layout visualization
☆43Jan 18, 2026Updated 6 months ago
StanfordLegion / task-bench
View on GitHub
A task benchmark
☆46Apr 17, 2026Updated 3 months ago
jhoviatt / bfi
View on GitHub
A brain*** interpreter in C
☆10Jan 21, 2023Updated 3 years ago