kriegalex / vscode-cuda
CUDA C++ syntax support & snippets for VSCode
☆20Updated 4 years ago
Alternatives and similar repositories for vscode-cuda:
Users that are interested in vscode-cuda are comparing it to the libraries listed below
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- A Deep Learning Framework customized for Sunway TaihuLight☆40Updated 6 years ago
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆70Updated 6 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- Emulating DMA Engines on GPUs for Performance and Portability☆39Updated 9 years ago
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆106Updated this week
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆32Updated 4 years ago
- portDNN is a library implementing neural network algorithms written using SYCL☆113Updated 10 months ago
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆261Updated 3 months ago
- flexible-gemm conv of deepcore☆17Updated 5 years ago
- Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )☆60Updated 3 weeks ago
- CUDA Tensor Transpose (cuTT) library☆51Updated 7 years ago
- Next generation SPARSE implementation for ROCm platform☆119Updated this week
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆82Updated 2 years ago
- An extension library of WMMA API (Tensor Core API)☆95Updated 9 months ago
- Next generation LAPACK implementation for ROCm platform☆99Updated this week
- Stretching GPU performance for GEMMs and tensor contractions.☆235Updated this week
- Triton Compiler related materials.☆28Updated 3 months ago
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆76Updated this week
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆93Updated 3 years ago
- CSR-based SpGEMM on nVidia and AMD GPUs☆45Updated 9 years ago
- Efficient SpGEMM on GPU using CUDA and CSR☆51Updated last year
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆37Updated 7 years ago
- Tools and extensions for CUDA profiling☆65Updated 5 years ago
- ☆67Updated 11 years ago
- RAJA Performance Suite☆119Updated last week
- HCC Sample Applications☆13Updated 8 years ago
- modified cutlass☆14Updated 4 years ago
- This is a demo how to write a high performance convolution run on apple silicon☆54Updated 3 years ago
- Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)☆107Updated last year