A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.
☆27Oct 13, 2024Updated last year
Alternatives and similar repositories for DrGPUM
Users that are interested in DrGPUM are comparing it to the libraries listed below
Sorting:
- A Top-Down Profiler for GPU Applications☆22Feb 29, 2024Updated 2 years ago
- GPU Performance Advisor☆66Jul 25, 2022Updated 3 years ago
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆32Dec 21, 2024Updated last year
- Implementation of Hyena Hierarchy in JAX☆10Apr 30, 2023Updated 2 years ago
- Scripts for monitoring InfiniBand and storage devices☆11Sep 4, 2015Updated 10 years ago
- ☆10May 12, 2022Updated 3 years ago
- ☆12Jan 4, 2024Updated 2 years ago
- GVProf: A Value Profiler for GPU-based Clusters☆53Mar 24, 2024Updated last year
- ☆15Sep 28, 2020Updated 5 years ago
- Debug print operator for cudagraph debugging☆14Aug 2, 2024Updated last year
- CUDA Template Functions☆20Dec 16, 2025Updated 2 months ago
- GPU based Compressed Graph Traversal☆16Jan 9, 2026Updated last month
- ☆18Jan 17, 2024Updated 2 years ago
- Bandwidth test for ROCm☆77Updated this week
- ☆23Jan 27, 2025Updated last year
- A Framework for Graph Sampling and Random Walk on GPUs.☆38Feb 3, 2025Updated last year
- An implementation of the Llama architecture, to instruct and delight☆21May 31, 2025Updated 9 months ago
- [FAST'25] ShiftLock: Mitigate One-sided RDMA Lock Contention via Handover.☆20Feb 11, 2025Updated last year
- Artifact of ASPLOS'23 paper entitled: GRACE: A Scalable Graph-Based Approach to Accelerating Recommendation Model Inference☆19Mar 5, 2023Updated 2 years ago
- Tutorials for Timemory☆21Aug 1, 2024Updated last year
- A task benchmark☆44Aug 5, 2024Updated last year
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆94Nov 6, 2023Updated 2 years ago
- A repository where GPU applications are aggregated using a common build flow that supports multiple CUDA versions.☆93Updated this week
- ☆23Jun 18, 2024Updated last year
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆57Mar 20, 2025Updated 11 months ago
- A practical way of learning Swizzle☆37Feb 3, 2025Updated last year
- High Performance Computing Conjugate Gradients: The original Mantevo miniapp☆19Jan 29, 2024Updated 2 years ago
- ☆52Dec 13, 2022Updated 3 years ago
- ngAP's artifact for ASPLOS'24☆25Jul 29, 2025Updated 7 months ago
- Nanos6 is a runtime that implements the OmpSs-2 parallel programming model, developed by the System Tools and Advanced Runtimes (STAR) gr…☆22Jun 6, 2025Updated 8 months ago
- ☆21Mar 3, 2025Updated 11 months ago
- Generate publication-quality figures using python☆23Jun 5, 2016Updated 9 years ago
- Global Memory and Threading runtime system☆25Dec 10, 2025Updated 2 months ago
- Parsers for CUDA binary files☆24Dec 29, 2023Updated 2 years ago
- Experimental GPU language with meta-programming☆26Sep 6, 2024Updated last year
- ☆65Apr 26, 2025Updated 10 months ago
- a wavelet-based multifractal image analysis tool implementing the WTMM (Wavelet Transform Modulus Maxima) method.☆11Feb 1, 2020Updated 6 years ago
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆32Jun 5, 2025Updated 8 months ago
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆31Nov 14, 2023Updated 2 years ago