NVIDIA/PyProf

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NVIDIA/PyProf)

NVIDIA / PyProf

A GPU performance profiling tool for PyTorch models

☆510

Alternatives and similar repositories for PyProf

Users that are interested in PyProf are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Stonesjtu / pytorch_memlab
View on GitHub
Profiling and inspecting memory in pytorch
☆1,079Jun 8, 2026Updated last month
awwong1 / torchprof
View on GitHub
PyTorch layer-by-layer model profiler
☆605May 23, 2021Updated 5 years ago
adityaiitb / pyprof2
View on GitHub
PyProf2: PyTorch Profiling tool
☆82Jun 25, 2020Updated 6 years ago
NVIDIA / apex
View on GitHub
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
☆8,985Updated this week
pytorch / TensorRT
View on GitHub
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
☆2,980Updated this week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
facebookresearch / fairscale
View on GitHub
PyTorch extensions for high performance and large scale training.
☆3,411Apr 26, 2025Updated last year
pytorch / kineto
View on GitHub
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
☆974Updated this week
pytorch / elastic
View on GitHub
PyTorch elastic training
☆727Jun 15, 2022Updated 4 years ago
NVIDIA / DALI
View on GitHub
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep lear…
☆5,728Updated this week
Santosh-Gupta / SpeedTorch
View on GitHub
Library for faster pinned CPU <-> GPU transfer in Pytorch
☆682Feb 21, 2020Updated 6 years ago
kakaobrain / torchgpipe
View on GitHub
A GPipe implementation in PyTorch
☆865Jul 25, 2024Updated last year
pytorch / tensorpipe
View on GitHub
A tensor-aware point-to-point communication primitive for machine learning
☆286Dec 17, 2025Updated 7 months ago
facebookresearch / pycls
View on GitHub
Codebase for Image Classification Research, written in PyTorch.
☆2,164Mar 20, 2024Updated 2 years ago
facebookresearch / fvcore
View on GitHub
Collection of common code that's shared among different research projects in FAIR computer vision team.
☆2,249Jun 2, 2026Updated last month
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
pytorch / benchmark
View on GitHub
TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.
☆1,042Updated this week
pytorch / tvm
View on GitHub
TVM integration into PyTorch
☆455Jan 15, 2020Updated 6 years ago
StacyYang / AutoTorch
View on GitHub
AutoTorch, A HPO Toolkit
☆60May 25, 2020Updated 6 years ago
zhijian-liu / torchprofile
View on GitHub
Count the MACs / FLOPs of PyTorch models
☆641Mar 11, 2026Updated 4 months ago
facebookresearch / higher
View on GitHub
higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual tr…
☆1,629Mar 25, 2022Updated 4 years ago
pytorch / extension-cpp
View on GitHub
C++ extensions in PyTorch
☆1,196Jan 13, 2026Updated 6 months ago
pytorch / contrib
View on GitHub
Implementations of ideas from recent papers
☆389Dec 22, 2020Updated 5 years ago
pytorch / torchdynamo
View on GitHub
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
☆1,078Apr 17, 2024Updated 2 years ago
NVIDIA / nccl
View on GitHub
Optimized primitives for collective multi-GPU communication
☆4,897Updated this week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
meta-pytorch / data
View on GitHub
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
☆1,258Updated this week
NVIDIA / libcudacxx
View on GitHub
[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl
☆2,304Feb 7, 2024Updated 2 years ago
NVIDIA / nvbench
View on GitHub
CUDA Kernel Benchmarking Library
☆902Updated this week
NVIDIA / gdrcopy
View on GitHub
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
☆1,399Jul 14, 2026Updated last week
rusty1s / pytorch_scatter
View on GitHub
PyTorch Extension Library of Optimized Scatter Operations
☆1,743Jun 3, 2026Updated last month
NVIDIA-AI-IOT / torch2trt
View on GitHub
An easy to use PyTorch to TensorRT converter
☆4,879Aug 17, 2024Updated last year
meta-pytorch / multipy
View on GitHub
torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…
☆179Dec 16, 2025Updated 7 months ago
NVIDIA / FasterTransformer
View on GitHub
Transformer related optimization, including BERT, GPT
☆6,442Mar 27, 2024Updated 2 years ago
adityaiitb / PyProf
View on GitHub
A GPU performance profiling tool for PyTorch models
☆22Jul 5, 2022Updated 4 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
NVIDIA / DeepLearningExamples
View on GitHub
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enter…
☆14,830Aug 12, 2024Updated last year
spcl / substation
View on GitHub
Research and development for optimizing transformers
☆132Feb 16, 2021Updated 5 years ago
pytorch / FBGEMM
View on GitHub
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
☆1,571Updated this week
Lyken17 / pytorch-OpCounter
View on GitHub
Count the MACs / FLOPs of your PyTorch model.
☆5,077Jul 8, 2024Updated 2 years ago
NVIDIA / NVTX
View on GitHub
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…
☆547Updated this week
NVIDIA / runx
View on GitHub
Deep Learning Experiment Management
☆641Dec 11, 2025Updated 7 months ago
pytorch / ort
View on GitHub
Accelerate PyTorch models with ONNX Runtime
☆369Feb 5, 2026Updated 5 months ago