0xBYTESHIFT / fp16
class that represents 16-bit floating point (half)
☆12Updated last year
Related projects ⓘ
Alternatives and complementary repositories for fp16
- symmetric int8 gemm☆66Updated 4 years ago
- A simple and fast library allowing to run async tasks and execute task graphs.☆42Updated last month
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆66Updated 5 years ago
- Common libraries for PPL projects☆29Updated last month
- PyTorch -> ONNX -> TVM for autotuning☆23Updated 4 years ago
- Parallel Tasking Library (PTL) - Lightweight C++11 mutilthreading tasking system featuring thread-pool, task-groups, and lock-free task q…☆43Updated last week
- flexible-gemm conv of deepcore☆17Updated 4 years ago
- TSDG: An efficient index graph for graph-based nearest neighbor search☆9Updated 2 years ago
- C++ fast hierarchical clustering algorithms☆81Updated last year
- C99/C++ header-only library for division via fixed-point multiplication by inverse☆49Updated 7 months ago
- An easy way to run, test, benchmark and tune OpenCL kernel files☆23Updated last year
- CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API☆26Updated last year
- Deep insight tensorrt, including but not limited to qat, ptq, plugin, triton_inference, cuda☆12Updated last week
- An Open Convolutional Neural Network Framework in C++ From Scratch☆59Updated 3 years ago
- ResNet Implementation, Training, and Inference Using LibTorch C++ API☆35Updated 5 months ago
- Swin Transformer C++ Implementation☆54Updated 3 years ago
- the C++ version of Seq2Seq with ncnn☆23Updated 3 years ago
- Automatically exported from code.google.com/p/math-neon☆38Updated 9 years ago
- ☆19Updated last month
- A pure C++ implementation of the lowess algorithm using templates☆21Updated 9 years ago
- ☆18Updated 3 years ago
- The Farm-SVE package provides a header that implements the ARM C language extensions (ACLE) for the ARM Scalable Vector Extension (SVE) i…☆13Updated 10 months ago
- Convert ONNX models to plain C++ code (without dependencies)☆18Updated last year
- IEEE 754-based c++ half-precision floating point library forked from http://half.sourceforge.net☆22Updated 3 years ago
- Tencent NCNN with added CUDA support☆67Updated 3 years ago
- MagmaDNN: a simple deep learning framework in c++☆45Updated 4 years ago
- Optimize GEMM with tensorcore step by step☆15Updated 11 months ago
- ☆28Updated 3 months ago
- study of cutlass☆19Updated last week