0xBYTESHIFT / fp16
class that represents 16-bit floating point (half)
☆11Updated last year
Related projects ⓘ
Alternatives and complementary repositories for fp16
- An Open Convolutional Neural Network Framework in C++ From Scratch☆59Updated 3 years ago
- Parallel Tasking Library (PTL) - Lightweight C++11 mutilthreading tasking system featuring thread-pool, task-groups, and lock-free task q…☆43Updated 3 months ago
- Common libraries for PPL projects☆29Updated 3 weeks ago
- Converting a deep neural network to integer-only inference in native C via uniform quantization and the fixed-point representation.☆20Updated 2 years ago
- A simple and fast library allowing to run async tasks and execute task graphs.☆41Updated 3 weeks ago
- TSDG: An efficient index graph for graph-based nearest neighbor search☆9Updated 2 years ago
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆64Updated 5 years ago
- A minimalistic header only C++11 Neural Network library based on Eigen::Tensor☆20Updated 6 years ago
- PyTorch -> ONNX -> TVM for autotuning☆23Updated 4 years ago
- An easy way to run, test, benchmark and tune OpenCL kernel files☆23Updated last year
- flexible-gemm conv of deepcore☆17Updated 4 years ago
- MLPerf™ Mobile models☆24Updated 3 weeks ago
- ☆42Updated 6 years ago
- High-Performance Computing: CPU Instructions, GPU OpenCL & CUDA, etc.☆14Updated 5 months ago
- Convert ONNX models to plain C++ code (without dependencies)☆18Updated last year
- ☆10Updated 3 years ago
- study of cutlass☆19Updated last year
- Yet another Polyhedra Compiler for DeepLearning☆19Updated last year
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆35Updated 8 years ago
- ☆13Updated last year
- The repository targets the OpenCL gemm function performance optimization. It compares several libraries clBLAS, clBLAST, MIOpenGemm, Inte…☆16Updated 5 years ago
- symmetric int8 gemm☆66Updated 4 years ago
- Fast and full-featured Matrix Market I/O library for C++, Python, and R☆75Updated 3 months ago
- ☆18Updated last month
- Portable 128-bit SIMD intrinsics☆55Updated last year
- ☆32Updated 10 years ago
- ResNet Implementation, Training, and Inference Using LibTorch C++ API☆35Updated 5 months ago
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 9 years ago
- This is a demo how to write a high performance convolution run on apple silicon☆52Updated 2 years ago
- C99/C++ header-only library for division via fixed-point multiplication by inverse☆48Updated 6 months ago