headupinclouds / half
half precision floating point library (fork)
☆7Updated 9 years ago
Alternatives and similar repositories for half
Users that are interested in half are comparing it to the libraries listed below
Sorting:
- Proof-of-Concept CNN in Halide☆22Updated 8 years ago
- A portable high-level API with CUDA or OpenCL back-end☆54Updated 7 years ago
- Optimized half precision gemm assembly kernels (deprecated due to ROCm)☆47Updated 7 years ago
- Communication-Minimizing 2D Convolution in GPU Registers☆30Updated 11 years ago
- Convolutional neural networks C++ framework with CPU and GPU (CUDA) backends☆178Updated 6 years ago
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 5 years ago
- BLAS OpenCL implementation.☆15Updated 10 years ago
- Simple and Cutting-edge Deep Learning Library accelerated with GPU using C++ AMP☆19Updated 9 years ago
- Torch FFI-bindings for NNPACK☆30Updated 7 years ago
- Visionbase is a clean C implementation of lots image processing and recognition algorithms.☆29Updated 11 years ago
- An OpenCL Torch Utility Library☆59Updated 9 years ago
- A fast deep neural network library (CPU) for speech recognition☆84Updated 6 years ago
- CNNs in Halide☆23Updated 9 years ago
- Library for fast image convolution in neural networks on Intel Architecture☆29Updated 7 years ago
- Speeding up and debittering Caffe by adding Halide☆18Updated 10 years ago
- ONNX Parser is a tool that automatically generates openvx inference code (CNN) from onnx binary model files.☆18Updated 6 years ago
- Deep neural network framework (C/C++/CUDA).☆31Updated 9 years ago
- A minimalist Deep Learning framework for embedded Computer Vision☆46Updated 5 years ago
- Mxnet Implementation of Google's MobileNets v2☆11Updated 7 years ago
- A Lua-based framework for vision.☆20Updated 13 years ago
- Sublinear memory optimization for deep learning, reduce GPU memory cost to train deeper nets☆28Updated 9 years ago
- Base code and optimized code for the benchmarks used in the PolyMage paper published at ASPLOS 2015☆19Updated 8 years ago
- Torch7 bindings for cuda-convnet2 kernels!☆40Updated 8 years ago
- A CUDA implementation of the PageRank Pipeline Benchmark☆32Updated 8 years ago
- ☆13Updated 8 years ago
- torch extensions☆27Updated 8 years ago
- OpenCL implementation of a NN and CNN☆22Updated 6 years ago
- Torch C data structures☆81Updated 7 years ago
- ☆68Updated 2 years ago
- Python Binding to NVRTC☆79Updated 7 months ago