KastnerRG / cgra4mlLinks
An Open Workflow to Build Custom SoCs and run Deep Models at the Edge
☆81Updated last month
Alternatives and similar repositories for cgra4ml
Users that are interested in cgra4ml are comparing it to the libraries listed below
Sorting:
- ☆94Updated last year
- This is a verilog implementation of 4x4 systolic array multiplier☆55Updated 4 years ago
- IEEE 754 single and double precision floating point library in systemverilog and vhdl☆67Updated 6 months ago
- NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions☆35Updated 2 months ago
- SAURIA (Systolic-Array tensor Unit for aRtificial Intelligence Acceleration) is an open-source Convolutional Neural Network accelerator b…☆46Updated 8 months ago
- ☆65Updated 6 years ago
- ☆47Updated 2 months ago
- INT8 & FP16 multiplier accumulator (MAC) design with UVM verification completed.☆103Updated 4 years ago
- Multi-core HW accelerator mapping optimization framework for layer-fused ML workloads.☆54Updated this week
- FPGA-based hardware accelerator for Vision Transformer (ViT), with Hybrid-Grained Pipeline.☆63Updated 5 months ago
- tpu-systolic-array-weight-stationary☆24Updated 4 years ago
- CHARM: Composing Heterogeneous Accelerators on Heterogeneous SoC Architecture☆143Updated this week
- 16-bit Adder Multiplier hardware on Digilent Basys 3☆76Updated last year
- My implementation of an FPGA Deep Neural Network Hardware Accelerator, moved from my bitbucket☆27Updated 5 years ago
- Library of approximate arithmetic circuits☆55Updated 2 years ago
- An Open-Hardware CGRA for accelerated computation on the edge.☆28Updated 9 months ago
- Verilog implementation of Softmax function☆67Updated 2 years ago
- ☆58Updated 5 years ago
- ☆42Updated 9 months ago
- Synthesizable Floating point unit written using Verilog. Supports 32-bit (Single-Precision) Multiplication, Addition and Division and Squ…☆56Updated 10 months ago
- 32-Bit Algorithms of Floating Point Operations are implemented on Verilog with logic Operations.☆85Updated 6 years ago
- A heterogeneous accelerator-centric compute cluster☆20Updated last week
- Systolic matrix multiplication kernel implemented on Xilinx PYNQ FPGA board☆14Updated 5 years ago
- Systolic array based simple TPU for CNN on PYNQ-Z2☆33Updated 3 years ago
- RapidStream TAPA compiles task-parallel HLS program into high-frequency FPGA accelerators.☆172Updated this week
- PYNQ Composabe Overlays☆73Updated last year
- Hardware accelerator for convolutional neural networks☆45Updated 2 years ago
- ☆71Updated 2 years ago
- A collection of tutorials for the fpgaConvNet framework.☆41Updated 9 months ago
- An open-source parameterizable NPU generator with full-stack multi-target compilation stack for intelligent workloads.☆57Updated 3 months ago