suruoxi / half
IEEE 754-based c++ half-precision floating point library forked from http://half.sourceforge.net
☆22Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for half
- C99/C++ header-only library for division via fixed-point multiplication by inverse☆49Updated 7 months ago
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 9 years ago
- Portable 128-bit SIMD intrinsics☆57Updated last year
- Software implementation of ARM and x86 SIMD intrinsics☆12Updated 5 years ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆83Updated 9 months ago
- A simple and fast library allowing to run async tasks and execute task graphs.☆42Updated last month
- Parallel Tasking Library (PTL) - Lightweight C++11 mutilthreading tasking system featuring thread-pool, task-groups, and lock-free task q…☆43Updated last week
- AVX-optimized sin(), cos(), exp() and log() functions☆113Updated 2 years ago
- UME::SIMD A library for explicit simd vectorization.☆90Updated 6 years ago
- A header only library implementing common mathematical functions using SIMD intrinsics☆95Updated this week
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- Concurrent CPU-GPU Programming using Task Models☆100Updated 4 years ago
- BGHT: High-performance static GPU hash tables.☆55Updated 2 months ago
- mallocMC: Memory Allocator for Many Core Architectures☆51Updated last week
- ☆68Updated 4 years ago
- C++ implementation of a 16 bit floating-point type mimicking most of the IEEE 754 behaviour. Compatible with the half data type used as t…☆141Updated 12 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆43Updated 10 months ago
- Task graph-based asynchronous programming system using C++ coroutine☆84Updated 9 months ago
- Conversion to/from half-precision floating point formats☆334Updated 3 months ago
- SYCL Conformance Tests☆62Updated last week
- Realtime GPU Profiler for AMD / NVIDIA / Intel GPUs☆31Updated last year
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆33Updated 3 years ago
- A reference implementation of std::simd, providing data parallel types in the C++ standard☆12Updated 4 years ago
- An extension library of WMMA API (Tensor Core API)☆84Updated 4 months ago
- SNIG: Accelerated Large Sparse Neural Network Inference using Task Graph Parallelism☆34Updated 3 years ago
- Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019☆52Updated 2 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆99Updated 7 years ago
- ☆54Updated this week
- Emulating DMA Engines on GPUs for Performance and Portability☆34Updated 9 years ago