msaroufim / awesome-profilingLinks
Awesome utilities for performance profiling
☆196Updated 7 months ago
Alternatives and similar repositories for awesome-profiling
Users that are interested in awesome-profiling are comparing it to the libraries listed below
Sorting:
- Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the…☆349Updated this week
 - Awesome resources for GPUs☆600Updated 2 years ago
 - CUDA checkpoint and restore utility☆379Updated last month
 - Home for OctoML PyTorch Profiler☆114Updated 2 years ago
 - AI/GPU flame graph☆189Updated 3 weeks ago
 - A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆305Updated this week
 - TORCH_LOGS parser for PT2☆62Updated last month
 - Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes☆134Updated 7 months ago
 - Machine Learning Framework for Operating Systems - Brings ML to Linux kernel☆250Updated 3 years ago
 - GPUOcelot: A dynamic compilation framework for PTX☆211Updated 8 months ago
 - MLIR-based partitioning system☆143Updated this week
 - Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake usin…☆29Updated 2 weeks ago
 - An experimental CPU backend for Triton (https//github.com/openai/triton)☆47Updated 2 months ago
 - High-Performance SGEMM on CUDA devices☆107Updated 9 months ago
 - TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆167Updated this week
 - Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research☆112Updated 2 years ago
 - A library to analyze PyTorch traces.☆421Updated this week
 - High-performance safetensors model loader☆70Updated 2 weeks ago
 - torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆181Updated 2 months ago
 - A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆359Updated this week
 - A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆543Updated this week
 - Benchmarks to capture important workloads.☆31Updated 9 months ago
 - 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆64Updated 9 months ago
 - An interactive web-based tool for exploring intermediate representations of PyTorch and Triton models☆50Updated 2 months ago
 - LLM training in simple, raw C/CUDA☆107Updated last year
 - Python bindings for UCX☆140Updated last month
 - Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆389Updated last week
 - Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆147Updated 2 years ago
 - Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆98Updated 2 weeks ago
 - An open-source efficient deep learning framework/compiler, written in python.☆733Updated last month