matiaslindgren / cuda-memory-access-recorderLinks
Record GPU memory accesses of a CUDA program and visualize the access pattern in a browser
β13Updated 5 years ago
Alternatives and similar repositories for cuda-memory-access-recorder
Users that are interested in cuda-memory-access-recorder are comparing it to the libraries listed below
Sorting:
- A tracing JIT compiler for PyTorchβ13Updated 4 years ago
- Nod.ai π¦ version of π» . You probably want to start at https://github.com/nod-ai/shark for the product and the upstream IREE repository β¦β107Updated 2 weeks ago
- An IR for efficiently simulating distributed ML computation.β31Updated last year
- A TUI-based utility for real-time monitoring of InfiniBand traffic and performance metrics on the local nodeβ44Updated 3 months ago
- Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as taβ¦β47Updated 4 years ago
- Benchmarks to capture important workloads.β31Updated 10 months ago
- Bandwidth test for ROCmβ70Updated last week
- π GPU load-balancing library for regular and irregular computations.β63Updated 3 months ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.β27Updated last year
- A tracing JIT for PyTorchβ17Updated 3 years ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)β47Updated 3 months ago
- β54Updated last year
- Experiments and prototypes associated with IREE or MLIRβ56Updated last year
- NVIDIA's launch, startup, and logging scripts used by our MLPerf Training and HPC submissionsβ35Updated 3 months ago
- portDNN is a library implementing neural network algorithms written using SYCLβ113Updated last year
- High-Performance SGEMM on CUDA devicesβ113Updated 10 months ago
- Unified compiler/runtime for interfacing with PyTorch Dynamo.β104Updated last week
- Training neural networks in TensorFlow 2.0 with 5x less memoryβ137Updated 3 years ago
- IREE C++ Templateβ18Updated last year
- Ahead of Time (AOT) Triton Math Libraryβ84Updated last month
- Cooperative Primitives for CUDA C++ Kernel Authors. This repository contains CUB PRs from Q4 2019 until Q4 2020.β22Updated 5 years ago
- β50Updated last year
- Asynchronous Task and Memory Interface, or ATMI, is a runtime framework and programming model for heterogeneous CPU-GPU systems. It proviβ¦β68Updated last year
- A unified framework across multiple programming platformsβ42Updated 6 months ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidthβ109Updated 8 years ago
- β13Updated 4 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repoβ56Updated 2 weeks ago
- XLA integration of Open Neural Network Exchange (ONNX)β19Updated 7 years ago
- GEMM and Winograd based convolutions using CUTLASSβ28Updated 5 years ago
- NUMA-aware multi-CPU multi-GPU data transfer benchmarksβ26Updated 2 years ago