suruoxi / half
IEEE 754-based c++ half-precision floating point library forked from http://half.sourceforge.net
☆23Updated 3 years ago
Alternatives and similar repositories for half
Users that are interested in half are comparing it to the libraries listed below
Sorting:
- C99/C++ header-only library for division via fixed-point multiplication by inverse☆51Updated last year
- Task graph-based asynchronous programming system using C++ coroutine☆89Updated last year
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 9 years ago
- A header only library implementing common mathematical functions using SIMD intrinsics☆104Updated 2 months ago
- Parallel Tasking Library (PTL) - Lightweight C++11 mutilthreading tasking system featuring thread-pool, task-groups, and lock-free task q…☆46Updated 6 months ago
- A simple and fast minimalistic header-only library allowing to run async tasks and execute task graphs.☆53Updated 5 months ago
- Conversion to/from half-precision floating point formats☆350Updated 9 months ago
- UME::SIMD A library for explicit simd vectorization.☆90Updated 7 years ago
- Portable 128-bit SIMD intrinsics☆58Updated last year
- Concurrent CPU-GPU Programming using Task Models☆102Updated 5 years ago
- Profiling Taskflow Programs through Visualization☆50Updated 2 years ago
- C++ convenience classes to be used with CUDA code, for both the host and the kerlel parts.☆55Updated 6 years ago
- AVX-optimized sin(), cos(), exp() and log() functions☆124Updated 3 years ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- The Farm-SVE package provides a header that implements the ARM C language extensions (ACLE) for the ARM Scalable Vector Extension (SVE) i…☆14Updated last year
- Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal - all it takes to sum a lot of numbers fast!☆96Updated last week
- Looking into the performance of heaps, starting with the Min-Max Heap☆65Updated 4 years ago
- C++ implementation of a 16 bit floating-point type mimicking most of the IEEE 754 behaviour. Compatible with the half data type used as t…☆145Updated 13 years ago
- ☆68Updated 2 years ago
- Agenium Scale vectorization library for CPUs and GPUs☆333Updated 3 years ago
- Lock-free atomic_shared_ptr implementations☆41Updated 11 months ago
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆70Updated 6 years ago
- A fast work-stealing queue template in C++☆306Updated last year
- portDNN is a library implementing neural network algorithms written using SYCL☆113Updated 11 months ago
- SNIG: Accelerated Large Sparse Neural Network Inference using Task Graph Parallelism☆34Updated 3 years ago
- Header-only safetensors loader and saver in C++☆61Updated this week
- Realtime GPU Profiler for AMD / NVIDIA / Intel GPUs☆32Updated last year
- ☆23Updated 8 years ago
- Generic system-wide modern C++ for heterogeneous platforms with SYCL from Khronos Group☆76Updated 4 years ago
- A C++ implementation of a LRU cache☆38Updated 4 years ago