suruoxi / half
IEEE 754-based c++ half-precision floating point library forked from http://half.sourceforge.net
☆22Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for half
- C99/C++ header-only library for division via fixed-point multiplication by inverse☆48Updated 6 months ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆83Updated 8 months ago
- Portable 128-bit SIMD intrinsics☆55Updated last year
- A header only library implementing common mathematical functions using SIMD intrinsics☆92Updated 2 weeks ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- flexible-gemm conv of deepcore☆17Updated 4 years ago
- Simple example of using Vulkan for GPGPU computing☆51Updated 6 years ago
- UME::SIMD A library for explicit simd vectorization.☆90Updated 6 years ago
- Realtime GPU Profiler for AMD / NVIDIA / Intel GPUs☆31Updated last year
- Emulating DMA Engines on GPUs for Performance and Portability☆34Updated 9 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆99Updated 7 years ago
- An extension library of WMMA API (Tensor Core API)☆82Updated 3 months ago
- C++ implementation of a 16 bit floating-point type mimicking most of the IEEE 754 behaviour. Compatible with the half data type used as t…☆141Updated 12 years ago
- Evaluating different memory managers for dynamic GPU memory☆24Updated 3 years ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆77Updated 5 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆42Updated 10 months ago
- BGHT: High-performance static GPU hash tables.☆55Updated last month
- AMD ROCm Performance Primitives (RPP) library is a comprehensive high-performance computer vision library for AMD processors with HIP/Ope…☆55Updated this week
- Conversion to/from half-precision floating point formats☆330Updated 3 months ago
- Agenium Scale vectorization library for CPUs and GPUs☆326Updated 3 years ago
- Demonstration of various hardware effects on CUDA GPUs.☆356Updated 11 months ago
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆64Updated 5 years ago
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆107Updated last year
- A framework that helps implementing swizzle GPU kernels☆41Updated 4 years ago
- ☆37Updated 3 years ago
- Software implementation of ARM and x86 SIMD intrinsics☆12Updated 5 years ago
- GPU-Accelerated Lossless Data Compressors Survey☆110Updated 4 years ago
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 9 years ago
- Full-speed Array of Structures access☆160Updated last year