Richielee630 / TMMALinks

TMMA: A Tiled Matrix Multiplication Accelerator for Self-Attention Projections in Transformer Models, optimized for edge deployment on Xilinx KV260.

☆23

Alternatives and similar repositories for TMMA

Users that are interested in TMMA are comparing it to the libraries listed below

Sorting:

GATECH-EIC / ViTCoD
[HPCA 2023] ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
☆122Updated 2 years ago
mit-han-lab / spatten
[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
☆111Updated last year
arc-research-lab / SSR
SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration (Full Paper Accepted in FPGA'24)
☆33Updated this week
gnodipac886 / ViT-FPGA-TPU
FPGA based Vision Transformer accelerator (Harvard CS205)
☆137Updated 8 months ago
shihuihong214 / P2-ViT
☆10Updated last year
GATECH-EIC / ViTALiTy
ViTALiTy (HPCA'23) Code Repository
☆23Updated 2 years ago
cjg91 / trans-fat
An FPGA Accelerator for Transformer Inference
☆91Updated 3 years ago
jha-lab / acceltran
[TCAD'23] AccelTran: A Sparsity-Aware Accelerator for Transformers
☆54Updated last year
Hazuyuki / PIM-HLS
☆12Updated 2 years ago
ribesstefano / Mapping-Multiple-LSTM-Models-on-FPGAs
Includes the SVD-based approximation algorithms for compressing deep learning models and the FPGA accelerators exploiting such approximat…
☆16Updated 2 years ago
hguq / HG-PIPE
FPGA-based hardware accelerator for Vision Transformer (ViT), with Hybrid-Grained Pipeline.
☆98Updated 9 months ago
sharc-lab / Edge-MoE
Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts
☆127Updated last year
IBM / 3D-CiM-LLM-Inference-Simulator
Simulator for LLM inference on an abstract 3D AIMC-based accelerator
☆24Updated last month
hazooree / LeNet-CNN-Accelerator-Hardware-for-FPGA
An open source Verilog Based LeNet-1 Parallel CNNs Accelerator for FPGAs in Vivado 2017
☆19Updated 6 years ago
leesou / H2-LLM-ISCA-2025
H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference
☆75Updated 6 months ago
PingchengDong / GQA-LUT
The official implementation of the DAC 2024 paper GQA-LUT
☆20Updated 10 months ago
UCLA-VAST / Serpens
Serpens is an HBM FPGA accelerator for SpMV
☆22Updated last year
shieldforever / NeuronQuant
[ASP-DAC 2025] "NeuronQuant: Accurate and Efficient Post-Training Quantization for Spiking Neural Networks" Official Implementation
☆13Updated 8 months ago
sjtu-zhao-lab / SALO
An efficient spatial accelerator enabling hybrid sparse attention mechanisms for long sequences
☆30Updated last year
hatsu3 / Sanger
☆47Updated 4 years ago
BoChen-Ye / Tiny_LeViT_Hardware_Accelerator
This is my hobby project with System Verilog to accelerate LeViT Network which contain CNN and Attention layer.
☆23Updated last year
hyupupup / conv_systolic_array
(Verilog) A simple convolution layer implementation with systolic array structure
☆13Updated 3 years ago
maeri-project / FEATHER
A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching
☆68Updated last week
CASR-HKU / MSD-FCCM23
Open-source of MSD framework
☆16Updated 2 years ago
abdelfattah-lab / BitMoD-HPCA-25
☆51Updated 3 months ago
mit-emze / cimloop
☆70Updated last month
fffasttime / AnyPackingNet
☆30Updated 7 months ago
maestro-project / AIrchitect-v2
[DATE 2025] Official implementation and dataset of AIrchitect v2: Learning the Hardware Accelerator Design Space through Unified Represen…
☆17Updated 9 months ago
fangjh21 / PALM
PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training
☆19Updated last year
VincentWang1998 / ai_on_chip_project1
tpu-systolic-array-weight-stationary
☆24Updated 4 years ago