fynv / ThrustRTC
CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.
☆59Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for ThrustRTC
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- ☆56Updated 2 months ago
- Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)☆100Updated last year
- Full-speed Array of Structures access☆161Updated last year
- GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.☆35Updated 2 months ago
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆92Updated 2 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆65Updated last year
- ☆20Updated 5 years ago
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆100Updated this week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆43Updated 10 months ago
- CUDA kernel author's tools☆109Updated 2 years ago
- A library of various helper routines and frameworks used by many of the lab's software☆43Updated 6 months ago
- BGHT: High-performance static GPU hash tables.☆55Updated 2 months ago
- Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support☆48Updated 3 months ago
- High-performance, GPU-aware communication library☆84Updated last month
- Source code examples from the Parallel Forall Blog☆94Updated 5 years ago
- DLA-Future☆65Updated this week
- Exploring using stdpar and Cython☆32Updated 4 years ago
- Some CUDA design patterns and a bit of template magic for CUDA☆146Updated last year
- Examples for using SYCL on CUDA☆60Updated 2 weeks ago
- Next generation library for iterative sparse solvers for ROCm platform☆76Updated this week
- SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) sy…☆93Updated 3 weeks ago
- GPU Eigensolver for symmetric/hermitian matrices.☆64Updated 3 years ago
- Unit benchmarks of CUDA event APIs.☆17Updated 6 months ago
- Header-only C++20 wrapper for MPI 4.0.☆43Updated last year
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago
- Interoperability examples for OpenACC.☆48Updated 4 years ago
- Template for starting CUDA/C++ project using CMake with Github Action for CI☆29Updated last year
- Distributed View Extension for Kokkos☆43Updated 2 months ago