KohakuBlueleaf / HakuTPU
An AI accelerator implementation with Xilinx FPGA
☆43Updated 3 months ago
Alternatives and similar repositories for HakuTPU
Users that are interested in HakuTPU are comparing it to the libraries listed below
Sorting:
- Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit☆148Updated last year
- ☆89Updated last year
- FPGA-based hardware accelerator for Vision Transformer (ViT), with Hybrid-Grained Pipeline.☆54Updated 3 months ago
- IC implementation of Systolic Array for TPU☆239Updated 6 months ago
- ☆42Updated 3 weeks ago
- FPGA based Vision Transformer accelerator (Harvard CS205)☆118Updated 3 months ago
- IC implementation of TPU☆124Updated 5 years ago
- hardware design of universal NPU(CNN accelerator) for various convolution neural network☆120Updated 2 months ago
- Machine-Learning Accelerator System Exploration Tools☆162Updated 2 weeks ago
- HW Architecture-Mapping Design Space Exploration Framework for Deep Learning Accelerators☆148Updated last month
- FREE TPU V3plus for FPGA is the free version of a commercial AI processor (EEP-TPU) for Deep Learning EDGE Inference☆145Updated last year
- [HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning☆86Updated 8 months ago
- Hardware accelerator for convolutional neural networks☆43Updated 2 years ago
- The codes and artifacts associated with our MICRO'22 paper titled: "Adaptable Butterfly Accelerator for Attention-based NNs via Hardware …☆132Updated last year
- Deep Learning Accelerator Based on Eyeriss V2 Architecture with custom RISC-V extended instructions☆190Updated 4 years ago
- CHARM: Composing Heterogeneous Accelerators on Heterogeneous SoC Architecture☆142Updated this week
- INT8 & FP16 multiplier accumulator (MAC) design with UVM verification completed.☆100Updated 4 years ago
- A SystemVerilog implementation of Row-Stationary dataflow and Hierarchical Mesh Network-on-Chip Architecture based on Eyeriss CNN Acceler…☆159Updated 5 years ago
- An AXI4 crossbar implementation in SystemVerilog☆148Updated this week
- SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration (Full Paper Accepted in FPGA'24)☆31Updated this week
- AutoSA: Polyhedral-Based Systolic Array Compiler☆221Updated 2 years ago
- Vector processor for RISC-V vector ISA☆117Updated 4 years ago
- Convolutional accelerator kernel, target ASIC & FPGA☆199Updated 2 years ago
- Small-scale Tensor Processing Unit built on an FPGA☆183Updated 5 years ago
- Fully opensource spiking neural network accelerator☆146Updated 2 years ago
- IEEE 754 floating point unit in Verilog☆135Updated 8 years ago
- Vector Acceleration IP core for RISC-V*☆178Updated this week
- Allo: A Programming Model for Composable Accelerator Design☆229Updated last week
- An Open Workflow to Build Custom SoCs and run Deep Models at the Edge☆77Updated this week
- A reading list for SRAM-based Compute-In-Memory (CIM) research.☆61Updated 3 months ago