matiaslindgren / cuda-memory-access-recorderLinks
Record GPU memory accesses of a CUDA program and visualize the access pattern in a browser
☆13Updated 4 years ago
Alternatives and similar repositories for cuda-memory-access-recorder
Users that are interested in cuda-memory-access-recorder are comparing it to the libraries listed below
Sorting:
- A tracing JIT compiler for PyTorch☆13Updated 3 years ago
- A TUI-based utility for real-time monitoring of InfiniBand traffic and performance metrics on the local node☆24Updated last month
- A tracing JIT for PyTorch☆17Updated 2 years ago
- 🎃 GPU load-balancing library for regular and irregular computations.☆62Updated last year
- ☆13Updated 4 years ago
- Asynchronous Task and Memory Interface, or ATMI, is a runtime framework and programming model for heterogeneous CPU-GPU systems. It provi…☆68Updated last year
- A task benchmark☆43Updated 10 months ago
- Cooperative Primitives for CUDA C++ Kernel Authors. This repository contains CUB PRs from Q4 2019 until Q4 2020.☆22Updated 4 years ago
- Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as ta…☆46Updated 3 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆55Updated 3 months ago
- An IR for efficiently simulating distributed ML computation.☆28Updated last year
- XLA integration of Open Neural Network Exchange (ONNX)☆19Updated 6 years ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Updated 3 years ago
- Benchmarks to capture important workloads.☆31Updated 4 months ago
- NUMA-aware multi-CPU multi-GPU data transfer benchmarks☆23Updated last year
- ☆32Updated 4 years ago
- An experimental ahead of time compiler for Relay.☆50Updated 5 years ago
- Training neural networks in TensorFlow 2.0 with 5x less memory☆132Updated 3 years ago
- Artifacts for SOSP'19 paper Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions☆21Updated 3 years ago
- IREE C++ Template☆17Updated 10 months ago
- A Top-Down Profiler for GPU Applications☆18Updated last year
- portDNN is a library implementing neural network algorithms written using SYCL☆113Updated last year
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆61Updated 2 months ago
- benchmarking some transformer deployments☆26Updated 2 years ago
- Code for Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture (accepted by PVLDB).The outdated wr…☆9Updated 2 years ago
- ☆16Updated 9 months ago
- A unified framework across multiple programming platforms☆41Updated 3 weeks ago
- ☆52Updated 10 months ago
- CUPTI GPU Profiler☆38Updated 6 years ago
- NVIDIA's launch, startup, and logging scripts used by our MLPerf Training and HPC submissions☆27Updated this week