suruoxi / halfLinks

IEEE 754-based c++ half-precision floating point library forked from http://half.sourceforge.net

☆23

Alternatives and similar repositories for half

Users that are interested in half are comparing it to the libraries listed below

Sorting:

edanor / umesimd
UME::SIMD A library for explicit simd vectorization.
☆90Updated 7 years ago
Maratyszcza / FXdiv
C99/C++ header-only library for division via fixed-point multiplication by inverse
☆52Updated last year
reyoung / avx_mathfun
AVX-optimized sin(), cos(), exp() and log() functions
☆124Updated 3 years ago
dian-lun-lin / taro
Task graph-based asynchronous programming system using C++ coroutine
☆90Updated last year
NVlabs / cub
THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.
☆84Updated last year
acgessler / half_float
C++ implementation of a 16 bit floating-point type mimicking most of the IEEE 754 behaviour. Compatible with the half data type used as t…
☆146Updated 13 years ago
milakov / int_fastdiv
Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.
☆70Updated 9 years ago
nadavrot / fast_log
A fast implementation of log() and exp()
☆53Updated 2 years ago
owensgroup / GpuBTree
Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019
☆55Updated 2 years ago
JishinMaster / simd_utils
A header only library implementing common mathematical functions using SIMD intrinsics
☆107Updated 3 months ago
ProjectPhysX / PTXprofiler
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
☆52Updated 2 months ago
jrmadsen / PTL
Parallel Tasking Library (PTL) - Lightweight C++11 mutilthreading tasking system featuring thread-pool, task-groups, and lock-free task q…
☆47Updated 6 months ago
Heteroflow / Heteroflow
Concurrent CPU-GPU Programming using Task Models
☆103Updated 5 years ago
ekondis / gpumembench
A GPU benchmark suite for assessing on-chip GPU memory bandwidth
☆105Updated 7 years ago
taskflow / tfprof
Profiling Taskflow Programs through Visualization
☆50Updated 2 years ago
Const-me / SimdIntroArticle
☆147Updated last year
ashvardanian / ParallelReductionsBenchmark
Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!
☆98Updated last week
owensgroup / BGHT
BGHT: High-performance static GPU hash tables.
☆65Updated last month
Maratyszcza / psimd
Portable 128-bit SIMD intrinsics
☆58Updated last year
taskflow / work-stealing-queue
A fast work-stealing queue template in C++
☆307Updated last year
CUDACommunity / CUDACommunityMeetup2021
☆23Updated 3 years ago
Maratyszcza / FP16
Conversion to/from half-precision floating point formats
☆355Updated 10 months ago
mmperf / mmperf
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
☆131Updated last year
PAA-NCIC / PPoPP2017_artifact
Third party assembler and GEMM library for NVIDIA Kepler GPU
☆81Updated 5 years ago
harrism / ranger
Generate simple index ranges in C++ and CUDA C++
☆39Updated last year
berenger-eu / farm-sve
The Farm-SVE package provides a header that implements the ARM C language extensions (ACLE) for the ARM Scalable Vector Extension (SVE) i…
☆14Updated last year
vinjn / GpuProf
Realtime GPU Profiler for AMD / NVIDIA / Intel GPUs
☆32Updated last year
crosetto / cupq
a CUDA implementation of a priority queue
☆84Updated 4 years ago
wjakob / dset
Lock-free parallel disjoint set data structure (aka UNION-FIND) with path compression and union by rank
☆64Updated 9 years ago
malithj / marlin
Library with JIT (Just-in-time) compilation support to optimize performance of small and medium matrix multiplication
☆14Updated 4 years ago