msaroufim / awesome-profiling
Awesome utilities for performance profiling
☆171Updated 2 months ago
Alternatives and similar repositories for awesome-profiling:
Users that are interested in awesome-profiling are comparing it to the libraries listed below
- Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the…☆313Updated 2 weeks ago
- Benchmarks to capture important workloads.☆31Updated 3 months ago
- Awesome resources for GPUs☆567Updated last year
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆40Updated last month
- MLIR-based partitioning system☆82Updated this week
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆102Updated this week
- High-Performance SGEMM on CUDA devices☆90Updated 3 months ago
- End to End steps for adding custom ops in PyTorch.☆22Updated 4 years ago
- A library to analyze PyTorch traces.☆368Updated last week
- ☆27Updated 3 months ago
- Home for OctoML PyTorch Profiler☆113Updated 2 years ago
- CUDA checkpoint and restore utility☆330Updated 3 months ago
- PyTorch centric eager mode debugger☆47Updated 4 months ago
- The missing pieces (as far as boilerplate reduction goes) of the upstream MLIR python bindings.☆91Updated this week
- extensible collectives library in triton☆86Updated last month
- ☆202Updated 2 weeks ago
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆138Updated this week
- ☆148Updated this week
- A thin, highly portable toolkit for efficiently compiling dense loop-based computation.☆148Updated 2 years ago
- A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture☆444Updated 3 months ago
- An IR for efficiently simulating distributed ML computation.☆28Updated last year
- ☆165Updated 10 months ago
- ☆102Updated last month
- Training neural networks in TensorFlow 2.0 with 5x less memory☆131Updated 3 years ago
- Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research☆109Updated last year
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆324Updated this week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆131Updated last year
- GPU documentation for humans☆46Updated 2 weeks ago
- TORCH_LOGS parser for PT2☆37Updated 2 weeks ago
- Conversions to MLIR EmitC☆128Updated 4 months ago