matiaslindgren / cuda-memory-access-recorderLinks
Record GPU memory accesses of a CUDA program and visualize the access pattern in a browser
β13Updated 5 years ago
Alternatives and similar repositories for cuda-memory-access-recorder
Users that are interested in cuda-memory-access-recorder are comparing it to the libraries listed below
Sorting:
- A tracing JIT compiler for PyTorchβ13Updated 4 years ago
- π GPU load-balancing library for regular and irregular computations.β64Updated 4 months ago
- An IR for efficiently simulating distributed ML computation.β32Updated 2 years ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)β48Updated 5 months ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.β27Updated last year
- A tracing JIT for PyTorchβ17Updated 3 years ago
- Torch Frontend for IREEβ25Updated 2 years ago
- A lightweight, Pythonic, frontend for MLIRβ80Updated 2 years ago
- Experiments and prototypes associated with IREE or MLIRβ56Updated last year
- IREE C++ Templateβ17Updated last year
- A task benchmarkβ44Updated last year
- CUDA templates for tile-sparse matrix multiplication based on CUTLASS.β50Updated 7 years ago
- β53Updated 8 months ago
- TORCH_TRACE parser for PT2β71Updated this week
- CUDAAdvisor: a GPU profiling toolβ51Updated 7 years ago
- Enabling on-the-fly manipulations with LLVM IR code of CUDA sourcesβ123Updated 9 months ago
- A language and compiler for irregular tensor programs.β152Updated last year
- Benchmarks to capture important workloads.β32Updated last week
- CUPTI GPU Profilerβ40Updated 6 years ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.β138Updated 2 years ago
- β55Updated last year
- A framework that helps implementing swizzle GPU kernelsβ51Updated 5 years ago
- β27Updated 2 years ago
- Re-implementation of the TASO compiler using equality saturationβ139Updated 4 years ago
- β20Updated 6 years ago
- β13Updated 4 years ago
- An experimental ahead of time compiler for Relay.β50Updated 5 years ago
- extensible collectives library in tritonβ92Updated 9 months ago
- Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as taβ¦β49Updated 4 years ago
- GEMM and Winograd based convolutions using CUTLASSβ28Updated 5 years ago