JishinMaster / simd_utilsLinks
A header only library implementing common mathematical functions using SIMD intrinsics
☆108Updated 4 months ago
Alternatives and similar repositories for simd_utils
Users that are interested in simd_utils are comparing it to the libraries listed below
Sorting:
- AVX-optimized sin(), cos(), exp() and log() functions☆124Updated 3 years ago
- A fast implementation of log() and exp()☆53Updated 2 years ago
- Cross platform portable accelerate math library using universal intrinsics.☆80Updated 4 years ago
- Add-on packages for Vector class library☆75Updated last year
- ☆31Updated 3 years ago
- Agenium Scale vectorization library for CPUs and GPUs☆333Updated 3 years ago
- Struct-of-Arrays generator for C++ projects.☆57Updated 10 months ago
- Mirror of the Cephes C source for reference☆93Updated last year
- Compact SVO optimized vector for C++17 or higher☆105Updated last year
- Fast random number generators: Vectorized (SIMD) version of xorshift128+☆117Updated 4 years ago
- SIMD optimised library for matrix inversion of 2x2, 3x3, and 4x4 matrices.☆93Updated 9 years ago
- NanoSTL, a small subset of C++ STL and libm☆127Updated 5 months ago
- C++20 Tensor library☆27Updated last month
- C++20 and onward collection of high performance data containers and related tools☆55Updated this week
- Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!☆99Updated 3 weeks ago
- Modified DirectXMath for cross-platform compiling☆35Updated 8 years ago
- ☆147Updated last year
- C++ Custom memory allocators☆60Updated 5 years ago
- A simple and fast minimalistic header-only library allowing to run async tasks and execute task graphs.☆53Updated 6 months ago
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 9 years ago
- "Small Vector" optimization for Modern C++: store up to a small number of items on the stack☆34Updated 4 years ago
- UME::SIMD A library for explicit simd vectorization.☆90Updated 7 years ago
- CPP20 implementation of a 16-bit floating-point type mimicking most of the IEEE 754 behavior. Single file and header-only.☆40Updated last year
- A dynamically-resizable vector with fixed capacity and embedded storage (P0843)☆24Updated 3 weeks ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated 2 years ago
- A simple, extensible, portable, efficient and header-only SIMD library!☆229Updated 3 years ago
- C++ implementation of a 16 bit floating-point type mimicking most of the IEEE 754 behaviour. Compatible with the half data type used as t…☆146Updated 13 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆55Updated 3 months ago
- Fast yet accurate trigonometric functions☆159Updated 3 years ago
- C++20 Memory Allocator library☆35Updated last month