GreenWaves-Technologies / bfloat16Links
bfloat16 dtype for numpy
☆20Updated 2 years ago
Alternatives and similar repositories for bfloat16
Users that are interested in bfloat16 are comparing it to the libraries listed below
Sorting:
- ☆160Updated 2 years ago
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆173Updated last week
- ☆170Updated 2 years ago
- [TCAD 2021] Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA☆17Updated 3 years ago
- The official, proof-of-concept C++ implementation of PocketNN.☆36Updated 4 months ago
- Sandbox for TVM and playing around!☆22Updated 3 years ago
- ☆29Updated 7 months ago
- Explore training for quantized models☆26Updated 6 months ago
- ☆68Updated 2 years ago
- Tool for the deployment and analysis of TinyML applications on TFLM and MicroTVM backends☆33Updated this week
- Prototype routines for GPU quantization written using PyTorch.☆21Updated 2 weeks ago
- GPTQ inference TVM kernel☆41Updated last year
- A Python library transfers PyTorch tensors between CPU and NVMe☆125Updated last year
- Fork of upstream onnxruntime focused on supporting risc-v accelerators☆88Updated 2 years ago
- A Data-Centric Compiler for Machine Learning☆85Updated last month
- Quantize transformers to any learned arbitrary 4-bit numeric format☆51Updated last week
- A tiny FP8 multiplication unit written in Verilog. TinyTapeout 2 submission.☆14Updated 3 years ago
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆25Updated 6 months ago
- This is the open-source version of TinyTS. The code is dirty so far. We may clean the code in the future.☆19Updated 5 months ago
- ☆50Updated last year
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆127Updated last year
- Converting a deep neural network to integer-only inference in native C via uniform quantization and the fixed-point representation.☆26Updated 4 years ago
- Torch2Chip (MLSys, 2024)☆55Updated 9 months ago
- Nod.ai 🦈 version of 👻 . You probably want to start at https://github.com/nod-ai/shark for the product and the upstream IREE repository …☆107Updated last month
- Framework to reduce autotune overhead to zero for well known deployments.☆94Updated 4 months ago
- Inference framework for MoE layers based on TensorRT with Python binding☆41Updated 4 years ago
- Open Source Projects from Pallas Lab☆21Updated 4 years ago
- Repository for CPU Kernel Generation for LLM Inference☆27Updated 2 years ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆48Updated 5 months ago
- llama INT4 cuda inference with AWQ☆55Updated last year