VCA-EPFL/FSA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/VCA-EPFL/FSA)

VCA-EPFL / FSA

FSA: Fusing FlashAttention within a Single Systolic Array

☆107

Alternatives and similar repositories for FSA

Users that are interested in FSA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ucb-bar / virgo
View on GitHub
Cluster-level matrix unit integration into GPUs, implemented in Chipyard SoC
☆54Jan 20, 2026Updated 3 months ago
ucb-bar / radiance
View on GitHub
A Heterogeneous GPU Platform for AI and Neural Graphics
☆50Updated this week
pku-dasys / easymac
View on GitHub
☆18Feb 3, 2022Updated 4 years ago
ConvolutedDog / gpgpu-sim-comments
View on GitHub
GPGPU-Sim 中文注释版代码，包含 GPGPU-Sim 模拟器的最新版代码，经过中文注释，以帮助中文用户更好地理解和使用该模拟器。
☆27Dec 18, 2024Updated last year
witmemtech / CIM-Technical-Papers-Collection
View on GitHub
Computing in memory optimizes data handling by performing operations directly in memory, ideal for high-speed data processing needs. This…
☆33Nov 22, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
huweim / dataflow_architecture
View on GitHub
Research about dataflow architecture
☆12Nov 30, 2023Updated 2 years ago
AlessandroCilardo / NaplesPU
View on GitHub
The official NaplesPU hardware code repository
☆24Jul 27, 2019Updated 6 years ago
pku-liang / Cement
View on GitHub
The Next-gen Language & Compiler Powering Efficient Hardware Design
☆37Jan 16, 2025Updated last year
kelvin0207 / SparSynergy
View on GitHub
Open source RTL implementation of Tensor Core, Sparse Tensor Core, BitWave and SparSynergy in the article: "SparSynergy: Unlocking Flexib…
☆23Mar 29, 2025Updated last year
ryanmacdonald / Ray-Tracing-GPU
View on GitHub
RTL implementation of a ray-tracing GPU
☆15Dec 18, 2012Updated 13 years ago
thustorage / Frugal
View on GitHub
for paper @ ASPLOS‘25’
☆16Mar 27, 2025Updated last year
arkhadem / aim_simulator
View on GitHub
A simulator for SK hynix AiM PIM architecture based on Ramulator 2.0
☆64Jul 22, 2025Updated 9 months ago
VincentWang1998 / ai_on_chip_project1
View on GitHub
tpu-systolic-array-weight-stationary
☆25May 7, 2021Updated 4 years ago
oscc-web / ysyx-website
View on GitHub
The official website of One Student One Chip project.
☆12Feb 5, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ReaLLMASIC / ReaLLM-Forge
View on GitHub
A Framework for Hardware-Aware LLM Exploration
☆37Apr 23, 2026Updated last week
taraeicher / PerceptronBranchPredictor
View on GitHub
Perceptron-based branch predictor written in C++
☆13Dec 14, 2016Updated 9 years ago
lycfly / EasyNPU
View on GitHub
A small Neural Network Processor for Edge devices.
☆19Nov 22, 2022Updated 3 years ago
zjnyly / TeraFly
View on GitHub
[DATE'2025, TCAD'2025] Terafly : A Multi-Node FPGA Based Accelerator Design for Efficient Cooperative Inference in LLMs
☆36Nov 13, 2025Updated 5 months ago
Tanggling / BOOMCore
View on GitHub
This is a project created and completed by team BOOM(Beihang OO masters).This is a superscalar processor with a 13-stage out-of-order dua…
☆18Sep 29, 2024Updated last year
CRAFT-THU / ActiveN
View on GitHub
RISC-V-based many-core neuromorphic architecture
☆16Apr 13, 2026Updated 3 weeks ago
CMU-SAFARI / MemBen
View on GitHub
Benchmark suite containing cache filtered traces for use with Ramulator. These include some of the workloads used in our SIGMETRICS 2019 …
☆23Oct 9, 2020Updated 5 years ago
hyupupup / conv_systolic_array
View on GitHub
(Verilog) A simple convolution layer implementation with systolic array structure
☆13May 9, 2022Updated 3 years ago
abdelazeem201 / Systolic-array-implementation-in-RTL-for-TPU
View on GitHub
IC implementation of Systolic Array for TPU
☆353Oct 21, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
IPADS-SAI / WaferAI-SIM
View on GitHub
The wafer-native AI accelerator simulation platform and inference engine.
☆55Jan 1, 2026Updated 4 months ago
sjtu-zhao-lab / SALO
View on GitHub
An efficient spatial accelerator enabling hybrid sparse attention mechanisms for long sequences
☆32Mar 7, 2024Updated 2 years ago
ucb-bar / gemmini
View on GitHub
Berkeley's Spatial Array Generator
☆1,294Mar 29, 2026Updated last month
CMU-SAFARI / ramulator2
View on GitHub
Ramulator 2.0 is a modern, modular, extensible, and fast cycle-accurate DRAM simulator. It provides support for agile implementation and …
☆539Apr 20, 2026Updated 2 weeks ago
tancheng / mlir-cgra
View on GitHub
An MLIR dialect to enable the efficient acceleration of ML model on CGRAs.
☆65Oct 9, 2024Updated last year
agurfinkel / btor2mlir
View on GitHub
BTOR2 MLIR project
☆26Jan 17, 2024Updated 2 years ago
SET-Scheduling-Project / GEMINI-HPCA2024
View on GitHub
Open-source Framework for HPCA2024 paper: Gemini: Mapping and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators
☆112Apr 28, 2025Updated last year
belanoa / softex
View on GitHub
☆16Oct 20, 2025Updated 6 months ago
AmbiML / iree-rv32-springbok
View on GitHub
☆23Mar 15, 2023Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
SET-Scheduling-Project / SET-ISCA2023
View on GitHub
The framework for the paper "Inter-layer Scheduling Space Definition and Exploration for Tiled Accelerators" in ISCA 2023.
☆83Mar 12, 2025Updated last year
csail-csg / pyverilator
View on GitHub
Python wrapper for verilator model
☆94Feb 10, 2024Updated 2 years ago
JZF-JZF / My-Code
View on GitHub
C++ Code
☆10Aug 13, 2019Updated 6 years ago
QiMeng-IPRC / QiMeng-cpu-v1
View on GitHub
[IJCAI 2024] QiMeng-CPU-v1: Automated CPU Design by Learning from Input-Output Examples
☆29May 4, 2025Updated last year
hkust-zhiyao / RTLLM
View on GitHub
An open-source benchmark for generating design RTL with natural language
☆188Nov 8, 2024Updated last year
leesou / H2-LLM-ISCA-2025
View on GitHub
H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference
☆100Apr 26, 2025Updated last year
wqzustc / High-Performance-Tensor-Processing-Engines
View on GitHub
Some Hardware Architectures for GEMM
☆293May 22, 2025Updated 11 months ago