ptheywood / cuda-cmake-github-actions
☆56Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for cuda-cmake-github-actions
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 9 years ago
- Thrust, CUB, TBB, AVX2, CUDA, OpenCL, OpenMP, SyCL - all it takes to sum a lot of numbers fast!☆73Updated 6 months ago
- CUDA kernel author's tools☆107Updated 2 years ago
- GitHub Action to install CUDA☆154Updated last month
- Some CUDA design patterns and a bit of template magic for CUDA☆146Updated last year
- Examples for using SYCL on CUDA☆60Updated this week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆42Updated 10 months ago
- Template for starting CUDA/C++ project using CMake with Github Action for CI☆29Updated last year
- Source code examples from the Parallel Forall Blog☆94Updated 5 years ago
- A nanobind example project☆90Updated this week
- Full-speed Array of Structures access☆160Updated last year
- CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.☆59Updated 2 years ago
- ☆20Updated 5 years ago
- Shared Pointer for Cuda Device Pointers and Cuda Streams, Smart Wrapper to Allocate and Deallocate Cuda Device Buffer.☆26Updated last year
- An implementation of BLAS using the SYCL open standard.☆259Updated last week
- μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updatin…☆149Updated this week
- Local and distributed octrees based on Morton codes with halo discovery and exchange with a 3D collision detection algorithm☆35Updated last month
- a CUDA implementation of a priority queue☆81Updated 4 years ago
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆81Updated last year
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆99Updated this week
- Header-only C++20 wrapper for MPI 4.0.☆43Updated last year
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆46Updated 2 months ago
- Reusable software components for ROCm developers☆78Updated this week
- WIP · CUDA compatibility for Blaze · https://bitbucket.org/blaze-lib/blaze☆17Updated 4 years ago
- Synchronous, single-threaded, library-only SYCL implementation for debugging and verification.☆27Updated last month
- Subset of BLAS routines optimized for NVIDIA GPUs☆65Updated last year
- An extension library of WMMA API (Tensor Core API)☆82Updated 3 months ago
- C++ library for reading and writing of numpy's .npy files☆372Updated last month
- portDNN is a library implementing neural network algorithms written using SYCL☆108Updated 5 months ago