KohakuBlueleaf / HakuTPULinks

An AI accelerator implementation with Xilinx FPGA

☆46

Alternatives and similar repositories for HakuTPU

Users that are interested in HakuTPU are comparing it to the libraries listed below

Sorting:

rejunity / tiny-asic-1_58bit-matrix-mul
Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit
☆155Updated last year
HLSTransform / submission
☆94Updated last year
pulp-platform / occamy
A high-efficiency system-on-chip for floating-point compute workloads.
☆36Updated 5 months ago
hguq / HG-PIPE
FPGA-based hardware accelerator for Vision Transformer (ViT), with Hybrid-Grained Pipeline.
☆68Updated 5 months ago
abdelazeem201 / Systolic-array-implementation-in-RTL-for-TPU
IC implementation of Systolic Array for TPU
☆251Updated 8 months ago
8krisv / CNN-ACCELERATOR
Hardware accelerator for convolutional neural networks
☆45Updated 2 years ago
ic-lab-duth / RISC-V-Vector
Vector processor for RISC-V vector ISA
☆121Updated 4 years ago
dpretet / axi-crossbar
An AXI4 crossbar implementation in SystemVerilog
☆160Updated last week
embedeep / FREE-TPU-V3plus-for-FPGA
FREE TPU V3plus for FPGA is the free version of a commercial AI processor (EEP-TPU) for Deep Learning EDGE Inference
☆152Updated 2 years ago
yuyuranium / FPGA-Project-2022-simple-tpu
Systolic array based simple TPU for CNN on PYNQ-Z2
☆33Updated 3 years ago
gnodipac886 / ViT-FPGA-TPU
FPGA based Vision Transformer accelerator (Harvard CS205)
☆124Updated 4 months ago
pulp-platform / ITA
☆47Updated 2 months ago
erihsu / INT_FP_MAC
INT8 & FP16 multiplier accumulator (MAC) design with UVM verification completed.
☆103Updated 4 years ago
thousrm / universal_NPU-CNN_accelerator
hardware design of universal NPU(CNN accelerator) for various convolution neural network
☆128Updated 3 months ago
SingularityKChen / dl_accelerator
Deep Learning Accelerator Based on Eyeriss V2 Architecture with custom RISC-V extended instructions
☆195Updated 5 years ago
DeepWok / mase
Machine-Learning Accelerator System Exploration Tools
☆168Updated 3 weeks ago
debtanu09 / systolic_array_matrix_multiplier
This is a verilog implementation of 4x4 systolic array multiplier
☆56Updated 4 years ago
sfmth / OpenSpike
Fully opensource spiking neural network accelerator
☆152Updated 2 years ago
cameronshinn / tiny-tpu
Small-scale Tensor Processing Unit built on an FPGA
☆191Updated 5 years ago
XUANTIE-RV / riscv-matrix-extension-spec
A matrix extension proposal for AI applications under RISC-V architecture
☆148Updated 4 months ago
MartaAndronic / PolyLUT
PolyLUT is the first quantized neural network training methodology that maps a neuron to a LUT while using multivariate polynomial functi…
☆53Updated last year
leo47007 / TPU-Tensor-Processing-Unit
IC implementation of TPU
☆124Updated 5 years ago
taichi-ishitani / tnoc
Network on Chip Implementation written in SytemVerilog
☆178Updated 2 years ago
suoglu / Fixed-Floating-Point-Adder-Multiplier
16-bit Adder Multiplier hardware on Digilent Basys 3
☆76Updated last year
intel / fpga-npu
☆192Updated last year
maomran / softmax
Verilog implementation of Softmax function
☆67Updated 2 years ago
KastnerRG / cgra4ml
An Open Workflow to Build Custom SoCs and run Deep Models at the Edge
☆81Updated last month
karthisugumar / CSE240D-Hierarchical_Mesh_NoC-Eyeriss_v2
A SystemVerilog implementation of Row-Stationary dataflow and Hierarchical Mesh Network-on-Chip Architecture based on Eyeriss CNN Acceler…
☆162Updated 5 years ago
ucb-bar / constellation
A Chisel RTL generator for network-on-chip interconnects
☆203Updated last month
lirui-shanghaitech / CNN-Accelerator-VLSI
Convolutional accelerator kernel, target ASIC & FPGA
☆211Updated 2 years ago