JishinMaster / simd_utils
A header only library implementing common mathematical functions using SIMD intrinsics
☆104Updated 2 months ago
Alternatives and similar repositories for simd_utils
Users that are interested in simd_utils are comparing it to the libraries listed below
Sorting:
- AVX-optimized sin(), cos(), exp() and log() functions☆124Updated 3 years ago
- A fast implementation of log() and exp()☆53Updated 2 years ago
- Cross platform portable accelerate math library using universal intrinsics.☆80Updated 4 years ago
- Agenium Scale vectorization library for CPUs and GPUs☆333Updated 3 years ago
- CPP20 implementation of a 16-bit floating-point type mimicking most of the IEEE 754 behavior. Single file and header-only.☆41Updated last year
- Struct-of-Arrays generator for C++ projects.☆51Updated 9 months ago
- NanoSTL, a small subset of C++ STL and libm☆126Updated 4 months ago
- Mirror of the Cephes C source for reference☆92Updated last year
- Add-on packages for Vector class library☆74Updated last year
- A simple and fast minimalistic header-only library allowing to run async tasks and execute task graphs.☆53Updated 5 months ago
- ☆31Updated 3 years ago
- SIMD optimised library for matrix inversion of 2x2, 3x3, and 4x4 matrices.☆93Updated 9 years ago
- ⏱️ single header benchmark framework for C and C++☆231Updated 8 months ago
- Compact SVO optimized vector for C++17 or higher☆103Updated 11 months ago
- C++ implementation of a 16 bit floating-point type mimicking most of the IEEE 754 behaviour. Compatible with the half data type used as t…☆145Updated 13 years ago
- Reference implementation of Grisu-Exact in C++☆62Updated 4 years ago
- Source code for 'Modern Parallel Programming with C++ and Assembly' by Dan Kusswurm☆63Updated 3 years ago
- ☆148Updated last year
- Conversion to/from half-precision floating point formats☆350Updated 9 months ago
- UME::SIMD A library for explicit simd vectorization.☆90Updated 7 years ago
- A curated list of awesome SIMD frameworks, libraries and software☆181Updated 7 months ago
- C++20 Tensor library☆27Updated 2 weeks ago
- Task graph-based asynchronous programming system using C++ coroutine☆89Updated last year
- Fast random number generators: Vectorized (SIMD) version of xorshift128+☆115Updated 4 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last month
- C++ Custom memory allocators☆58Updated 4 years ago
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 9 years ago
- Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal - all it takes to sum a lot of numbers fast!☆96Updated last week
- A fully featured single header library implementing a vector container with a small buffer optimization.☆59Updated 2 weeks ago
- Modified DirectXMath for cross-platform compiling☆35Updated 8 years ago