GreenWaves-Technologies / bfloat16Links
bfloat16 dtype for numpy
☆20Updated 2 years ago
Alternatives and similar repositories for bfloat16
Users that are interested in bfloat16 are comparing it to the libraries listed below
Sorting:
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆175Updated this week
- A tiny FP8 multiplication unit written in Verilog. TinyTapeout 2 submission.☆14Updated 3 years ago
- ☆169Updated 2 years ago
- ☆160Updated 2 years ago
- The Riallto Open Source Project from AMD☆84Updated 10 months ago
- A Deep Learning Framework for the Posit Number System☆31Updated last year
- Torch2Chip (MLSys, 2024)☆55Updated 10 months ago
- Customized matrix multiplication kernels☆57Updated 3 years ago
- Nod.ai 🦈 version of 👻 . You probably want to start at https://github.com/nod-ai/shark for the product and the upstream IREE repository …☆107Updated last month
- The official, proof-of-concept C++ implementation of PocketNN.☆36Updated 4 months ago
- [TCAD 2021] Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA☆17Updated 3 years ago
- This project contains a code generator that produces static C NN inference deployment code targeting tiny micro-controllers (TinyML) as r…☆30Updated 4 years ago
- Fork of upstream onnxruntime focused on supporting risc-v accelerators☆88Updated 2 years ago
- ☆40Updated last year
- Sandbox for TVM and playing around!☆22Updated 3 years ago
- ☆68Updated 2 years ago
- We have implemented a framework that supports developers to structured prune neural networks of Tensorflow Models☆28Updated last year
- ☆19Updated 2 months ago
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆182Updated last month
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆25Updated 6 months ago
- Explore training for quantized models☆26Updated 6 months ago
- Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit☆175Updated last year
- Converting a deep neural network to integer-only inference in native C via uniform quantization and the fixed-point representation.☆25Updated 4 years ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆125Updated last year
- ☆40Updated last year
- ☆11Updated 4 years ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆127Updated last year
- Official implementation of "Searching for Winograd-aware Quantized Networks" (MLSys'20)☆27Updated 2 years ago
- ☆77Updated last year
- Curated content for DNN approximation, acceleration ... with a focus on hardware accelerator and deployment☆27Updated last year