A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.
☆30Oct 13, 2024Updated last year
Alternatives and similar repositories for DrGPUM
Users that are interested in DrGPUM are comparing it to the libraries listed below
Sorting:
- A Top-Down Profiler for GPU Applications☆22Feb 29, 2024Updated 2 years ago
- GPU Performance Advisor☆66Jul 25, 2022Updated 3 years ago
- ☆10May 12, 2022Updated 3 years ago
- Awesome resources for GPUs☆610Mar 10, 2026Updated last week
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆30Dec 21, 2024Updated last year
- ☆15Sep 17, 2024Updated last year
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆57Mar 20, 2025Updated last year
- ngAP's artifact for ASPLOS'24☆26Jul 29, 2025Updated 7 months ago
- ☆11Jan 4, 2022Updated 4 years ago
- Bandwidth test for ROCm☆80Updated this week
- ☆31Jun 15, 2022Updated 3 years ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆94Nov 6, 2023Updated 2 years ago
- A repository where GPU applications are aggregated using a common build flow that supports multiple CUDA versions.☆93Mar 4, 2026Updated 2 weeks ago
- Debug print operator for cudagraph debugging☆14Aug 2, 2024Updated last year
- Generate publication-quality figures using python☆23Jun 5, 2016Updated 9 years ago
- High Performance Computing Conjugate Gradients: The original Mantevo miniapp☆19Jan 29, 2024Updated 2 years ago
- 一个用Apple Metal实现的Llama和通义千问大模型本地推理☆10Apr 26, 2024Updated last year
- GPU based Compressed Graph Traversal☆16Jan 9, 2026Updated 2 months ago
- ☆12Aug 26, 2025Updated 6 months ago
- A demo project demonstrating the performance improvement by cpp extension, which wrapped with pybind11.☆10Nov 16, 2021Updated 4 years ago
- Source code for paper "On the Pareto Front of Multilingual Neural Machine Translation" @ NeurIPS 2023☆17Sep 27, 2023Updated 2 years ago
- ☆29Oct 22, 2020Updated 5 years ago
- LZW en- and decoding that goes weeeee!☆32Nov 18, 2025Updated 4 months ago
- Artifact of ASPLOS'23 paper entitled: GRACE: A Scalable Graph-Based Approach to Accelerating Recommendation Model Inference☆19Mar 5, 2023Updated 3 years ago
- A task benchmark☆44Aug 5, 2024Updated last year
- Source code for the paper: Accelerating Dynamic Graph Analytics on GPUs☆30Jun 19, 2023Updated 2 years ago
- A simple cycle-accurate DaDianNao simulator☆13Mar 27, 2019Updated 6 years ago
- ☆19Jan 17, 2024Updated 2 years ago
- Code for the paper: "T-shape data and probabilistic remaining useful life prediction for Li-ion batteries using multiple non-crossing qua…☆10Aug 4, 2023Updated 2 years ago
- HCC Sample Applications☆13Jan 3, 2017Updated 9 years ago
- DrCCTProf is a fine-grained call path profiling framework for binaries running on ARM and X86 architectures.☆123Oct 26, 2023Updated 2 years ago
- ☆33Sep 9, 2020Updated 5 years ago
- CPU and GPU tutorial examples☆13Apr 4, 2025Updated 11 months ago
- ☆17Dec 9, 2022Updated 3 years ago
- [FAST'25] ShiftLock: Mitigate One-sided RDMA Lock Contention via Handover.☆20Feb 11, 2025Updated last year
- Parsers for CUDA binary files☆24Dec 29, 2023Updated 2 years ago
- cuJSON: A Highly Parallel JSON Parser for GPUs☆42Dec 12, 2025Updated 3 months ago
- ☆18Sep 27, 2022Updated 3 years ago
- XSBench: The Monte Carlo Macroscopic Cross Section Lookup Benchmark☆89Mar 11, 2024Updated 2 years ago