msaroufim / awesome-profiling
Awesome utilities for performance profiling
☆169Updated 2 weeks ago
Alternatives and similar repositories for awesome-profiling:
Users that are interested in awesome-profiling are comparing it to the libraries listed below
- Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the…☆302Updated last week
- Benchmarks to capture important workloads.☆30Updated last month
- CUDA checkpoint and restore utility☆310Updated last month
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆39Updated last week
- Awesome resources for GPUs☆553Updated last year
- A library to analyze PyTorch traces.☆350Updated this week
- Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research☆110Updated last year
- Dias: Dynamic Rewriting of Pandas Code☆66Updated 3 months ago
- ☆106Updated 3 weeks ago
- JAX interpreter for Vulkan☆13Updated 3 years ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆314Updated this week
- LLM training in simple, raw C/CUDA☆92Updated 10 months ago
- oneAPI Collective Communications Library (oneCCL)☆225Updated 2 weeks ago
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆132Updated last week
- TorchFix - a linter for PyTorch-using code with autofix support☆136Updated last month
- MLIR-based partitioning system☆73Updated this week
- A tool for bandwidth measurements on NVIDIA GPUs.☆392Updated last month
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆73Updated 2 weeks ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆127Updated last year
- GPUOcelot: A dynamic compilation framework for PTX☆181Updated last month
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆155Updated 3 months ago
- collection of benchmarks to measure basic GPU capabilities☆308Updated last month
- Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes☆78Updated last month
- Training neural networks in TensorFlow 2.0 with 5x less memory☆130Updated 3 years ago
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆178Updated 3 months ago
- The Triton backend for the PyTorch TorchScript models.☆144Updated last week
- End to End steps for adding custom ops in PyTorch.☆21Updated 4 years ago
- High-Performance SGEMM on CUDA devices☆86Updated 2 months ago
- Multi-Instance-GPU profiling tool☆57Updated last year
- A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.☆784Updated this week