khaki3 / ptxas-wrapperLinks
A Symbolic Emulator for Shuffle Synthesis on the NVIDIA PTX Code
☆15Updated 2 years ago
Alternatives and similar repositories for ptxas-wrapper
Users that are interested in ptxas-wrapper are comparing it to the libraries listed below
Sorting:
- ☆40Updated last month
- Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware☆15Updated 3 years ago
- A novel spatial accelerator for horizontal diffusion weather stencil computation, as described in ICS 2023 paper by Singh et al. (https:/…☆22Updated 2 years ago
- AI Accelerators-SC23-tutorial Repository☆11Updated last year
- ☆54Updated 5 years ago
- TAPA is a dataflow HLS framework that features fast compilation, expressive programming model and generates high-frequency FPGA accelerat…☆19Updated last year
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆56Updated 7 months ago
- A GPU benchmark suite for autotuners☆19Updated last year
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆43Updated 4 years ago
- LLVM Plugin to Instrument Global Memory Accesses in CUDA Kernels☆10Updated 5 years ago
- ☆18Updated last year
- Heterogeneous Accelerated Computed Cluster (HACC) Resources Page☆22Updated last month
- PIRA - Automatic Instrumentation Refinement☆16Updated last year
- ☆38Updated 3 years ago
- ☆64Updated 6 years ago
- Code released to accompany the ISCA paper: "T4: Compiling Sequential Code for Effective Speculative Parallelization in Hardware"☆28Updated 3 years ago
- ☆19Updated last month
- A dynamic analysis tool to detect floating-point errors in HPC applications.☆36Updated 2 weeks ago
- RISC-V vector extension ISA simulation☆16Updated 6 years ago
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Updated 6 years ago
- CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels☆32Updated 4 years ago
- BEER determines an ECC code's parity-check matrix based on the uncorrectable errors it can cause. BEER targets Hamming codes that are use…☆19Updated 5 years ago
- The University of Bristol HPC Simulation Engine☆100Updated 2 months ago
- HeteroSync is a benchmark suite for performing fine-grained synchronization on tightly coupled GPUs☆30Updated last year
- ☆62Updated last year
- CUDAAdvisor: a GPU profiling tool☆51Updated 7 years ago
- TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)☆50Updated last week
- Slides and exercises for persistent memory programming tutorial☆14Updated 2 years ago
- A Benchmark Toolkit for Assembly Instructions Using the LLVM JIT☆17Updated 5 years ago
- Multiple 1-stencil implementations using nvidia cuda.☆13Updated 7 years ago