GreenWaves-Technologies / bfloat16Links

bfloat16 dtype for numpy

☆20

Alternatives and similar repositories for bfloat16

Users that are interested in bfloat16 are comparing it to the libraries listed below

Sorting:

fastmachinelearning / qonnx
QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX
☆164Updated last week
IntelLabs / FP8-Emulation-Toolkit
PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.
☆111Updated 11 months ago
AndrewZhaoLuo / TVM-Sandbox
Sandbox for TVM and playing around!
☆22Updated 2 years ago
cchan / fp8_mul
A tiny FP8 multiplication unit written in Verilog. TinyTapeout 2 submission.
☆14Updated 2 years ago
Qualcomm-AI-research / FP8-quantization
☆163Updated 2 years ago
ucb-bar / onnxruntime-riscv
Fork of upstream onnxruntime focused on supporting risc-v accelerators
☆87Updated 2 years ago
jundaf2 / INT8-Flash-Attention-FMHA-Quantization
☆158Updated 2 years ago
tum-ei-eda / utvm_staticrt_codegen
This project contains a code generator that produces static C NN inference deployment code targeting tiny micro-controllers (TinyML) as r…
☆29Updated 4 years ago
north-numerical-computing / tensor-cores-numerical-behavior
Test suite for probing the numerical behavior of NVIDIA tensor cores
☆41Updated last year
DeMoriarty / custom_matmul_kernels
Customized matrix multiplication kernels
☆57Updated 3 years ago
RaulMurillo / deep-pensieve
A Deep Learning Framework for the Posit Number System
☆30Updated last year
LeiWang1999 / AutoGPTQ.tvm
GPTQ inference TVM kernel
☆39Updated last year
masahi / torchscript-to-tvm
☆68Updated 2 years ago
meta-pytorch / triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)
☆47Updated 2 months ago
deepspeedai / DeepSpeed-Kernels
☆71Updated 7 months ago
benja263 / Integer-Only-Inference-for-Deep-Learning-in-Native-C
Converting a deep neural network to integer-only inference in native C via uniform quantization and the fixed-point representation.
☆26Updated 3 years ago
jax-ml / ml_dtypes
A stand-alone implementation of several NumPy dtype extensions used in machine learning.
☆305Updated last week
IBM / triton-dejavu
Framework to reduce autotune overhead to zero for well known deployments.
☆85Updated last month
makslevental / nelli
A lightweight, Pythonic, frontend for MLIR
☆80Updated 2 years ago
AMDResearch / Riallto
The Riallto Open Source Project from AMD
☆84Updated 6 months ago
nod-ai / transformer-benchmarks
benchmarking some transformer deployments
☆26Updated 2 years ago
CAS-CLab / BlockConv
[TCAD 2021] Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA
☆17Updated 3 years ago
brightlaboratory / polydl
☆11Updated 4 years ago
e-dupuis / awesome-approximate-dnn
Curated content for DNN approximation, acceleration ... with a focus on hardware accelerator and deployment
☆27Updated last year
zhenhuaw-me / tflite
Parse TFLite models (*.tflite) EASILY with Python. Check the API at https://zhenhuaw.me/tflite/docs/
☆102Updated 9 months ago
KohakuBlueleaf / HakuTPU
An AI accelerator implementation with Xilinx FPGA
☆68Updated 9 months ago
pulp-platform / quantlab
☆39Updated last year
nod-ai / SRT
Nod.ai 🦈 version of 👻 . You probably want to start at https://github.com/nod-ai/shark for the product and the upstream IREE repository …
☆106Updated 10 months ago
ruslangrimov / mnist-minimal-model
Trying to find out what is the minimal model that can achieve 99% accuracy on MNIST dataset
☆27Updated 7 years ago
iree-org / iree-nvgpu
☆50Updated last year