eunomia-bpf/cupti-tutorial

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/eunomia-bpf/cupti-tutorial)

eunomia-bpf / cupti-tutorial

Tutorials for NVIDIA CUPTI samples

☆70

Alternatives and similar repositories for cupti-tutorial

Users that are interested in cupti-tutorial are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

eunomia-bpf / nccl-eBPF
View on GitHub
☆20Jul 7, 2026Updated 2 weeks ago
eunomia-bpf / xpu-perf
View on GitHub
Continuous profiler for GPU & CPU with eBPF
☆24Nov 24, 2025Updated 8 months ago
wu-kan / wuk_cupti_wrapper
View on GitHub
a simple API to use CUPTI
☆10Aug 19, 2025Updated 11 months ago
eunomia-bpf / bpfix
View on GitHub
Making eBPF verifier errors as friendly as Rust compiler errors, and let LLM fix it.
☆18Jul 11, 2026Updated 2 weeks ago
parca-dev / parcagpu
View on GitHub
CUPTI based GPU profiling library exposing usdt hooks
☆37Jun 30, 2026Updated 3 weeks ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
open-neutrino / neutrino
View on GitHub
☆264Dec 25, 2025Updated 7 months ago
KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆34Jul 29, 2025Updated 11 months ago
FindHao / drgpu
View on GitHub
A Top-Down Profiler for GPU Applications
☆23Feb 29, 2024Updated 2 years ago
eunomia-bpf / XDP-on-GPU
View on GitHub
eBPF XDP on GPU
☆16Oct 5, 2025Updated 9 months ago
eunomia-bpf / gpu_ext
View on GitHub
eBPF for GPU UVM offloading and scheduling in Linux kernel
☆59Apr 15, 2026Updated 3 months ago
atomicapple0 / libsmctrl
View on GitHub
Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO. I AM NOT ONE OF THE AUTHORS OF THE PAPER.
☆67Nov 24, 2025Updated 8 months ago
facebookexperimental / CUTracer
View on GitHub
A dynamic binary instrumentation tool for tracing and analyzing CUDA kernel instructions.
☆72Updated this week
meta-pytorch / tritonparse
View on GitHub
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
☆211Jul 15, 2026Updated last week
antgroup / DeepXTrace
View on GitHub
DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.
☆101Jan 16, 2026Updated 6 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
NVIDIA / CompileIQ
View on GitHub
An Optimizer for Nvidia Compilers.
☆110Jul 3, 2026Updated 3 weeks ago
ParCoreLab / Snoopie
View on GitHub
Multi-GPU communication profiler and visualizer
☆43Jun 10, 2024Updated 2 years ago
yao-jz / intra-kernel-profiler
View on GitHub
Region-level profiling for CUDA kernels with trace, NVBit, CUPTI, NSys, and an interactive Explorer.
☆122Apr 17, 2026Updated 3 months ago
ademeure / DeeperGEMM
View on GitHub
DeeperGEMM: crazy optimized version
☆86May 5, 2025Updated last year
Nelson-Cheung / yatsenos-riscv
View on GitHub
Rebuild YatSenOS On RISC-V 64.
☆23Jan 6, 2022Updated 4 years ago
Multi-V-VM / GPUOS
View on GitHub
Share your GPU without MIG or MPS
☆50Jan 27, 2026Updated 5 months ago
matinraayai / Luthier
View on GitHub
Luthier, a GPU binary instrumentation tool for AMD GPUs
☆28Updated this week
meta-pytorch / spmd_types
View on GitHub
This module defines a type system for distributed training code, based off of JAX's sharding in types, but adapted for the PyTorch ecosys…
☆34Updated this week
eunomia-bpf / eGPU
View on GitHub
Extending eBPF Programmability and Observability to GPUs (merged into https://github.com/eunomia-bpf/bpftime)
☆308Nov 24, 2025Updated 8 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ByteDance-Seed / StragglerAnalysis
View on GitHub
☆56Apr 30, 2025Updated last year
inclusionAI / asystem-amem
View on GitHub
A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.
☆113Dec 17, 2025Updated 7 months ago
0x5ec1ab / gpu-tlb
View on GitHub
☆84Apr 18, 2025Updated last year
aaupov / ebpf-bolt
View on GitHub
eBPF tool to collect BOLT profile
☆14Apr 9, 2026Updated 3 months ago
eunomia-bpf / basic-cuda-tutorial
View on GitHub
A collection of CUDA programming examples to learn GPU programming
☆111Updated this week
WitchTools / Witch
View on GitHub
Lightweight performance and debugging tools
☆17Feb 21, 2020Updated 6 years ago
fzyzcjy / torch_utils
View on GitHub
Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatio…
☆114Sep 11, 2025Updated 10 months ago
vortexgpgpu / NVPTX-SPIRV-Translator
View on GitHub
The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.
☆45Oct 25, 2021Updated 4 years ago
WaveSpeedAI / QuantumAttention
View on GitHub
[WIP] Better (FP8) attention for Hopper
☆33Feb 24, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
sderek / CUDAAdvisor
View on GitHub
CUDAAdvisor: a GPU profiling tool
☆53Aug 24, 2018Updated 7 years ago
ParCoreLab / CPU-Free-model
View on GitHub
Source code for the CPU-Free model - a fully autonomous execution model for multi-GPU applications that completely excludes the involveme…
☆21Apr 25, 2024Updated 2 years ago
flagos-ai / libtriton_jit
View on GitHub
A Triton JIT runtime and ffi provider in C++
☆37Updated this week
zejia-lin / BulletServe
View on GitHub
Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration
☆53Jan 8, 2026Updated 6 months ago
Jokeren / triton-samples
View on GitHub
☆29Jan 17, 2025Updated last year
GVProf / GVProf
View on GitHub
GVProf: A Value Profiler for GPU-based Clusters
☆54Mar 24, 2024Updated 2 years ago
NVIDIA / nsight-python
View on GitHub
Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Tools
☆283Updated this week