GreenWaves-Technologies / bfloat16
bfloat16 dtype for numpy
☆16Updated 11 months ago
Related projects: ⓘ
- The Riallto Open Source Project from AMD☆63Updated 3 weeks ago
- ☆63Updated 8 months ago
- Simple and fast low-bit matmul kernels in CUDA☆48Updated this week
- A Deep Learning Framework for the Posit Number System☆23Updated last month
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆98Updated 9 months ago
- IREE's PyTorch Frontend, based on Torch Dynamo.☆43Updated this week
- ☆19Updated 5 months ago
- Torch2Chip (MLSys, 2024)☆49Updated 3 weeks ago
- Low Precision Arithmetic Simulation in PyTorch - extension for posit and beyond☆13Updated last year
- BARVINN: A Barrel RISC-V Neural Network Accelerator: https://barvinn.readthedocs.io/en/latest/☆75Updated last month
- News and Paper Collections for Machine Learning Hardware☆20Updated 4 months ago
- A list of awesome neural symbolic papers.☆37Updated 2 years ago
- Fork of upstream onnxruntime focused on supporting risc-v accelerators☆77Updated last year
- Nod.ai 🦈 version of 👻 . You probably want to start at https://github.com/nod-ai/shark for the product and the upstream IREE repository …☆107Updated this week
- VeRLPy is an open-source python library developed to improve the digital hardware verification process by using Reinforcement Learning (R…☆22Updated last year
- Memory Optimizations for Deep Learning (ICML 2023)☆58Updated 6 months ago
- muRISCV-NN is a collection of efficient deep learning kernels for embedded platforms and microcontrollers.☆57Updated last week
- ☆50Updated 3 months ago
- ☆113Updated last year
- Adaptive floating-point based numerical format for resilient deep learning☆14Updated 2 years ago
- Customized matrix multiplication kernels☆53Updated 2 years ago
- ☆151Updated last year
- TensorCore Vector Processor for Deep Learning - Google Summer of Code Project☆20Updated 3 years ago
- LLM4HWDesign Starting Toolkit☆15Updated this week
- ☆11Updated 3 years ago
- FlexASR: A Reconfigurable Hardware Accelerator for Attention-based Seq-to-Seq Networks☆42Updated 2 years ago
- Fast sparse deep learning on CPUs☆51Updated last year
- Training with Block Minifloat number representation☆14Updated 3 years ago
- A 8-/16-/32-/64-bit floating point number family☆15Updated 2 years ago