msaroufim / awesome-profiling
Awesome utilities for performance profiling
☆159Updated last year
Alternatives and similar repositories for awesome-profiling:
Users that are interested in awesome-profiling are comparing it to the libraries listed below
- Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the…☆296Updated this week
- GPUOcelot: A dynamic compilation framework for PTX☆169Updated last week
- CUDA checkpoint and restore utility☆292Updated 3 weeks ago
- A library to analyze PyTorch traces.☆332Updated last week
- Awesome resources for GPUs☆546Updated last year
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆128Updated this week
- MLPerf™ logging library☆32Updated this week
- An open-source efficient deep learning framework/compiler, written in python.☆681Updated last week
- ☆180Updated this week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆118Updated last year
- Benchmarks to capture important workloads.☆29Updated 3 weeks ago
- ☆130Updated last week
- PyTorch centric eager mode debugger☆46Updated 2 months ago
- Learning about CUDA by writing PTX code.☆35Updated 11 months ago
- This repository hosts code that supports the testing infrastructure for the PyTorch organization. For example, this repo hosts the logic …☆87Updated this week
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆75Updated last year
- Learning how to write "Less Slow" code in C++ 20, C 99, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception hand…☆432Updated last week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆38Updated 9 months ago
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆178Updated 2 months ago
- A tool for bandwidth measurements on NVIDIA GPUs.☆364Updated 2 weeks ago
- ☆159Updated 8 months ago
- MLIR-based partitioning system☆62Updated this week
- TorchFix - a linter for PyTorch-using code with autofix support☆129Updated 2 weeks ago
- extensible collectives library in triton☆83Updated 4 months ago
- Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research☆109Updated last year
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆154Updated 2 months ago
- Cataloging released Triton kernels.☆168Updated last month
- CUDA Matrix Multiplication Optimization☆161Updated 7 months ago
- Home for OctoML PyTorch Profiler☆107Updated last year
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆304Updated this week