guoheng / bfloat16
Convert single precision float to bfloat16 (Brain Floating Point) floating-point format
☆14Updated 5 years ago
Alternatives and similar repositories for bfloat16:
Users that are interested in bfloat16 are comparing it to the libraries listed below
- ☆29Updated 3 years ago
- Implementation of convolution layer in different flavors☆68Updated 7 years ago
- Fork of upstream onnxruntime focused on supporting risc-v accelerators☆84Updated 2 years ago
- Fast matrix multiplication for few-bit integer matrices on CPUs.☆27Updated 6 years ago
- Example code and instructions on getting Tensorflow Lite running on a Xilinx Zynq☆49Updated 7 years ago
- ☆27Updated 4 years ago
- Ternary Weights and Activations☆24Updated 6 years ago
- implementing a Recurrent Neural Network with binarized weight format on FPGA☆22Updated 7 years ago
- ☆36Updated 2 years ago
- The official, proof-of-concept C++ implementation of PocketNN.☆32Updated 10 months ago
- ☆69Updated 2 years ago
- Approximate layers - TensorFlow extension☆27Updated last week
- A self-contained version of the tutorial which can be easily cloned and viewed by others.☆24Updated 5 years ago
- An implementation of a BinaryConnect network for cifar10☆11Updated 5 years ago
- ☆58Updated 3 years ago
- LCAI-TIHU SW is a software stack of the AI inference processor based on RISC-V☆23Updated 2 years ago
- A 8-/16-/32-/64-bit floating point number family☆17Updated 3 years ago
- A Winograd based kernel for convolutions in deep learning framework☆15Updated 7 years ago
- Accelergy is an energy estimation infrastructure for accelerator energy estimations☆136Updated 2 months ago
- Explore the energy-efficient dataflow scheduling for neural networks.☆221Updated 4 years ago
- ☆39Updated 7 years ago
- ☆14Updated 5 years ago
- Official implementation of "Searching for Winograd-aware Quantized Networks" (MLSys'20)☆27Updated last year
- BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing☆137Updated 5 years ago
- Open Source Compiler Framework using ONNX as Frontend and IR☆29Updated 2 years ago
- CK workflow, portable packages and other artifacts for the ReQuEST-ASPLOS'18 submission:☆12Updated 6 years ago
- Implementing CNN code in CUDA and OpenCL to evaluate its performance on NVIDIA GPUs, AMD GPUs, and an FPGA platform.☆54Updated 8 years ago
- FireSim-NVDLA: NVIDIA Deep Learning Accelerator (NVDLA) Integrated with RISC-V Rocket Chip SoC Running on the Amazon FPGA Cloud☆160Updated 3 years ago
- Linear model training using stochastic gradient descent (SGD) on PYNQ with full to low precision.☆54Updated 7 years ago
- TVM stack: exploring the incredible explosion of deep-learning frameworks and how to bring them together☆64Updated 6 years ago