jeremad / cuda-travis
☆20Updated 5 years ago
Related projects ⓘ
Alternatives and complementary repositories for cuda-travis
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.☆35Updated 2 months ago
- Training examples for SYCL☆38Updated last week
- ☆12Updated 3 months ago
- CUDA kernel author's tools☆109Updated 2 years ago
- The C++ Standard Library for your entire system.☆15Updated last month
- A task benchmark☆40Updated 3 months ago
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆92Updated 2 years ago
- Full-speed Array of Structures access☆162Updated last year
- portDNN is a library implementing neural network algorithms written using SYCL☆108Updated 6 months ago
- Synchronous, single-threaded, library-only SYCL implementation for debugging and verification.☆27Updated 2 months ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆99Updated 7 years ago
- CUDA and OpenMP implementations of C2R/R2C inplace transposition☆45Updated 9 years ago
- 🎃 GPU load-balancing library for regular and irregular computations.☆57Updated 5 months ago
- A library to benchmark CUDA code, similar to google benchmark.☆28Updated 3 years ago
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago
- Advanced Profiling and Analytics for AMD Hardware☆137Updated this week
- A C++ allocator based on cudaMallocManaged☆23Updated 6 years ago
- ☆17Updated 10 months ago
- This tool serves as a test harness for different optimization techniques to improve stencil computations performance in shared and distri…☆20Updated 2 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆65Updated last year
- Header-only C++20 wrapper for MPI 4.0.☆14Updated last year
- Distributed View Extension for Kokkos☆43Updated 2 months ago
- Autonomic Performance Environment for eXascale (APEX)☆38Updated 3 weeks ago
- Kernel Tuning Toolkit☆55Updated 3 weeks ago
- Use CUDA intrinsics with user-defined types☆47Updated 10 years ago
- Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)☆102Updated last year
- sparse matrix pre-processing library☆81Updated 6 months ago
- Thrust, CUB, TBB, AVX2, CUDA, OpenCL, OpenMP, SyCL - all it takes to sum a lot of numbers fast!☆73Updated 6 months ago
- Efficient SpGEMM on GPU using CUDA and CSR☆50Updated last year