kriegalex / vscode-cuda
CUDA C++ syntax support & snippets for VSCode
☆20Updated 3 years ago
Alternatives and similar repositories for vscode-cuda:
Users that are interested in vscode-cuda are comparing it to the libraries listed below
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆106Updated this week
- Tools and extensions for CUDA profiling☆65Updated 5 years ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- HCC Sample Applications☆13Updated 8 years ago
- portDNN is a library implementing neural network algorithms written using SYCL☆111Updated 9 months ago
- An extension library of WMMA API (Tensor Core API)☆90Updated 7 months ago
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆68Updated 5 years ago
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆79Updated 2 weeks ago
- study of cutlass☆21Updated 3 months ago
- CUDA Tensor Transpose (cuTT) library☆51Updated 7 years ago
- Dissecting NVIDIA GPU Architecture☆89Updated 2 years ago
- flexible-gemm conv of deepcore☆17Updated 5 years ago
- Emulating DMA Engines on GPUs for Performance and Portability☆38Updated 9 years ago
- Kernel Tuning Toolkit☆59Updated last month
- ROCm Parallel Primitives☆171Updated this week
- This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU source-level debugger.☆53Updated this week
- RCCL Performance Benchmark Tests☆59Updated last month
- Optimized half precision gemm assembly kernels (deprecated due to ROCm)☆47Updated 7 years ago
- assembler for NVIDIA FERMI. Imported from Google Code☆72Updated 9 years ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆32Updated 4 years ago
- Examples for using SYCL on CUDA☆62Updated last week
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆93Updated 3 years ago
- ROCm Device Libraries☆97Updated 10 months ago
- HIP back-end for Thrust that has been replaced by rocThrust☆28Updated last year
- Efficient SpGEMM on GPU using CUDA and CSR☆52Updated last year
- Experiments evaluating preemption on the NVIDIA Pascal architecture☆18Updated 8 years ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆55Updated 3 weeks ago
- Learn OpenCL step by step.☆133Updated 2 years ago
- My notes on various HPC papers.☆21Updated 2 years ago