poweic / libcumatrixLinks
GPU Matrix Library - A CUDA-based C++ wrapper and syntax sugars for NVIDIA CUBLAS
☆28Updated 9 years ago
Alternatives and similar repositories for libcumatrix
Users that are interested in libcumatrix are comparing it to the libraries listed below
Sorting:
- Full-speed Array of Structures access☆174Updated 2 years ago
- Combined array and automatic differentiation library in C++☆176Updated 3 months ago
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆71Updated 9 years ago
- Fast matrix multiplication☆29Updated 4 years ago
- Automatic Differentiation C++ Library☆57Updated 4 years ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated 2 years ago
- C++ Library for Portable SIMD Vectorization☆84Updated 9 months ago
- Boost.uBlas☆116Updated 3 weeks ago
- Launching collective tasks in bulk☆37Updated 5 years ago
- Flexible Library for Efficient Numerical Solutions☆127Updated 2 months ago
- portDNN is a library implementing neural network algorithms written using SYCL☆113Updated last year
- fast log and exp functions for AVX2/AVX-512☆233Updated 5 months ago
- Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.☆348Updated 3 years ago
- Execution primitives for C++☆153Updated 5 years ago
- Parallel Tasking Library (PTL) - Lightweight C++11 mutilthreading tasking system featuring thread-pool, task-groups, and lock-free task q…☆48Updated 9 months ago
- Portable and vendor neutral framework for parallel programming on heterogeneous platforms.☆431Updated 2 weeks ago
- Some C++ codes for computing a 1D and 2D convolution product using the FFT implemented with the GSL or FFTW☆59Updated 12 years ago
- ☆42Updated 6 years ago
- C++ multidimensional arrays in the spirit of the STL☆201Updated 3 months ago
- CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.☆59Updated 3 years ago
- NumPy-compatible multidimensional arrays in C++☆161Updated 10 months ago
- C99/C++ header-only library for division via fixed-point multiplication by inverse☆55Updated last year
- ☆74Updated 2 years ago
- An OpenMP runtime implemented using HPX☆24Updated 3 years ago
- A portable high-level API with CUDA or OpenCL back-end☆54Updated 7 years ago
- A Light-weight and Fast Template Matrix Library☆134Updated 12 years ago
- Parallel k-D Tree Construction☆57Updated 13 years ago
- ☆68Updated 3 years ago
- UME::SIMD A library for explicit simd vectorization.☆91Updated 7 years ago
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆262Updated 7 months ago