kriegalex / vscode-cuda
CUDA C++ syntax support & snippets for VSCode
☆20Updated 3 years ago
Alternatives and similar repositories for vscode-cuda:
Users that are interested in vscode-cuda are comparing it to the libraries listed below
- A Deep Learning Framework customized for Sunway TaihuLight☆40Updated 6 years ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆83Updated 11 months ago
- CUDA Tensor Transpose (cuTT) library☆51Updated 7 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- Dissecting NVIDIA GPU Architecture☆84Updated 2 years ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆79Updated last year
- Tools and extensions for CUDA profiling☆63Updated 5 years ago
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆67Updated 5 years ago
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆36Updated 7 years ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆32Updated 4 years ago
- Online CUDA Occupancy Calculator☆74Updated 3 years ago
- Emulating DMA Engines on GPUs for Performance and Portability☆35Updated 9 years ago
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆104Updated this week
- Triton Compiler related materials.☆29Updated 3 weeks ago
- Efficient SpGEMM on GPU using CUDA and CSR☆50Updated last year
- a c++/cuda template library for tensor lazy evaluation☆163Updated last year
- gossip: Efficient Communication Primitives for Multi-GPU Systems☆58Updated 2 years ago
- ☆14Updated 2 years ago
- Python bindings for NVTX☆66Updated last year
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆123Updated 4 years ago
- An extension library of WMMA API (Tensor Core API)☆87Updated 6 months ago
- flexible-gemm conv of deepcore☆17Updated 5 years ago
- Training material for Nsight developer tools☆143Updated 5 months ago
- CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.☆59Updated 2 years ago
- Scalable GPU Kernel Fission/Fusion Transformation for Memory-Bound Kernels☆13Updated 9 years ago
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆93Updated 3 years ago
- study of cutlass☆20Updated 2 months ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆74Updated last year
- portDNN is a library implementing neural network algorithms written using SYCL☆109Updated 8 months ago
- Efficient Top-K implementation on the GPU☆150Updated 5 years ago