0xBYTESHIFT / fp16
class that represents 16-bit floating point (half)
☆11Updated last year
Alternatives and similar repositories for fp16:
Users that are interested in fp16 are comparing it to the libraries listed below
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆69Updated 5 years ago
- A simple and fast minimalistic header-only library allowing to run async tasks and execute task graphs.☆52Updated 4 months ago
- C++ fast hierarchical clustering algorithms☆87Updated last year
- A C++ neural network library for machine learning☆14Updated 11 months ago
- Automatically exported from code.google.com/p/math-neon☆40Updated 9 years ago
- Parallel Tasking Library (PTL) - Lightweight C++11 mutilthreading tasking system featuring thread-pool, task-groups, and lock-free task q…☆45Updated 4 months ago
- An Open Convolutional Neural Network Framework in C++ From Scratch☆61Updated 4 years ago
- C99/C++ header-only library for division via fixed-point multiplication by inverse☆50Updated 11 months ago
- a single-header math library☆16Updated 5 months ago
- Common libraries for PPL projects☆29Updated 3 weeks ago
- Demonstration of a factory pattern where the types automatically register themselves☆11Updated 6 years ago
- A pure C++ implementation of the lowess algorithm using templates☆21Updated 9 years ago
- Deep insight tensorrt, including but not limited to qat, ptq, plugin, triton_inference, cuda☆16Updated last week
- A header only library implementing common mathematical functions using SIMD intrinsics☆103Updated last month
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 9 years ago
- Conversion to/from half-precision floating point formats☆346Updated 8 months ago
- A C++ port of the Python module under the same name☆52Updated last year
- Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )☆60Updated last week
- Convert ONNX models to plain C++ code (without dependencies)☆20Updated 2 years ago
- symmetric int8 gemm☆66Updated 4 years ago
- Deep Neural Network Architectures with dlib☆19Updated 2 months ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- A single file C++17 header-only Minimal Acyclic Subsequential Transducers, or Finite State Transducers☆55Updated 2 years ago
- portDNN is a library implementing neural network algorithms written using SYCL☆111Updated 10 months ago
- AMD ROCm Performance Primitives (RPP) library is a comprehensive high-performance computer vision library for AMD processors with HIP/Ope…☆59Updated this week
- An easy way to run, test, benchmark and tune OpenCL kernel files☆23Updated last year
- This is a demo how to write a high performance convolution run on apple silicon☆54Updated 3 years ago
- Agenium Scale vectorization library for CPUs and GPUs☆331Updated 3 years ago
- Portable 128-bit SIMD intrinsics☆58Updated last year
- Fast and full-featured Matrix Market I/O library for C++, Python, and R☆76Updated 7 months ago