eyalroz / cuda-api-wrappers
Thin, unified, C++-flavored wrappers for the CUDA APIs
☆797Updated this week
Related projects ⓘ
Alternatives and complementary repositories for cuda-api-wrappers
- stdgpu: Efficient STL-like Data Structures on the GPU☆1,162Updated this week
- A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).☆518Updated 5 months ago
- ☆486Updated this week
- An efficient C++17 GPU numerical computing library with Python-like syntax☆1,220Updated this week
- A lightweight high performance tensor algebra framework for modern C++☆751Updated 7 months ago
- CUDA Kernel Benchmarking Library☆519Updated this week
- CUDA Core Compute Libraries☆1,278Updated this week
- CUDA kernel author's tools☆109Updated 2 years ago
- Abstraction Library for Parallel Kernel Acceleration☆356Updated this week
- Modular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template…☆355Updated 3 months ago
- Open Source Parallel STL implementation☆517Updated 9 months ago
- oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html☆724Updated this week
- std::simd for GCC [ISO/IEC TS 19570:2018]☆579Updated last year
- SYCL Academy, a set of learning materials for SYCL heterogeneous programming☆459Updated this week
- [ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl☆1,684Updated last year
- Expressive Vector Engine - SIMD in C++ Goes Brrrr☆964Updated this week
- Patterns and behaviors for GPU computing☆1,667Updated 2 years ago
- Demonstration of various hardware effects on CUDA GPUs.☆358Updated 11 months ago
- Agenium Scale vectorization library for CPUs and GPUs☆328Updated 3 years ago
- Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C+…☆1,390Updated this week
- Vector class library, latest version☆1,308Updated 9 months ago
- Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.☆344Updated 2 years ago
- RAJA Performance Portability Layer (C++)☆488Updated this week
- RAPIDS Memory Manager☆492Updated this week
- oneAPI Math Kernel Library (oneMKL) Interfaces☆622Updated this week
- Reference implementation of mdspan targeting C++23☆413Updated last week
- SIMD Vector Classes for C++☆1,458Updated 5 months ago
- An implementation of BLAS using the SYCL open standard.☆259Updated 2 weeks ago
- C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))☆2,211Updated last week
- Intel TBB with CMake build system☆371Updated 2 years ago