msaroufim / awesome-profiling
Awesome utilities for performance profiling
☆152Updated last year
Alternatives and similar repositories for awesome-profiling:
Users that are interested in awesome-profiling are comparing it to the libraries listed below
- Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the…☆289Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆38Updated 8 months ago
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆240Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆291Updated this week
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆179Updated last month
- Home for OctoML PyTorch Profiler☆107Updated last year
- ☆154Updated 7 months ago
- extensible collectives library in triton☆76Updated 3 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆157Updated 3 weeks ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆114Updated last year
- A library to analyze PyTorch traces.☆324Updated last month
- An experimental CPU backend for Triton☆75Updated this week
- Cataloging released Triton kernels.☆156Updated last week
- TORCH_LOGS parser for PT2☆30Updated last week
- Awesome resources for GPUs☆522Updated last year
- ☆170Updated last week
- The missing pieces (as far as boilerplate reduction goes) of the upstream MLIR python bindings.☆75Updated last week
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆152Updated last month
- Implementation of a Transformer, but completely in Triton☆251Updated 2 years ago
- End to End steps for adding custom ops in PyTorch.☆19Updated 4 years ago
- Benchmarks to capture important workloads.☆29Updated this week
- Curated list of awesome material on optimization techniques to make artificial intelligence faster and more efficient 🚀☆113Updated last year
- ☆23Updated this week
- PyTorch centric eager mode debugger☆43Updated last month
- A sandbox for quick iteration and experimentation on projects related to IREE, MLIR, and LLVM☆56Updated 4 months ago
- An open-source efficient deep learning framework/compiler, written in python.☆668Updated this week
- PyTorch RFCs (experimental)☆131Updated 4 months ago
- This repository hosts code that supports the testing infrastructure for the PyTorch organization. For example, this repo hosts the logic …☆84Updated this week
- CUDA checkpoint and restore utility☆268Updated 9 months ago
- LLM training in simple, raw C/CUDA☆91Updated 8 months ago