jeremad / cuda-travis
☆20Updated 5 years ago
Related projects: ⓘ
- GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.☆34Updated 8 months ago
- Generate simple index ranges in C++ and CUDA C++☆38Updated last year
- A library to benchmark CUDA code, similar to google benchmark.☆27Updated 3 years ago
- Training examples for SYCL☆38Updated 6 months ago
- Use CUDA intrinsics with user-defined types☆47Updated 10 years ago
- CUDA and OpenMP implementations of C2R/R2C inplace transposition☆44Updated 9 years ago
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆90Updated 2 years ago
- CUDA kernel author's tools☆105Updated 2 years ago
- Full-speed Array of Structures access☆155Updated last year
- ☆15Updated 8 months ago
- Kokkos Remote Spaces implements distributed Kokkos Views and related APIs for distributed parallel programming.☆42Updated 2 weeks ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆63Updated last year
- Autonomic Performance Environment for eXascale (APEX)☆38Updated this week
- A task benchmark☆39Updated last month
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago
- RAJA Performance Suite☆110Updated last week
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆15Updated this week
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆96Updated 7 years ago
- Range-based for loops to iterate over a range of numbers or values☆35Updated 7 years ago
- Reusable software components for ROCm developers☆81Updated last week
- Kernel Tuning Toolkit☆54Updated 3 weeks ago
- sparse matrix pre-processing library☆81Updated 4 months ago
- A library for C++/Fortran computer simulations (e.g. stencil codes, mesh-free, unstructured grids, n-body & particle methods). Scales fro…☆38Updated 3 years ago
- Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)☆97Updated last year
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆21Updated last week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆39Updated 8 months ago
- Implementation of AMD HIP for CPUs☆22Updated 4 years ago
- ROCm Parallel Primitives☆156Updated this week
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆100Updated this week
- portDNN is a library implementing neural network algorithms written using SYCL☆106Updated 3 months ago