FSA: Fusing FlashAttention within a Single Systolic Array
☆89Aug 12, 2025Updated 6 months ago
Alternatives and similar repositories for FSA
Users that are interested in FSA are comparing it to the libraries listed below
Sorting:
- Cluster-level matrix unit integration into GPUs, implemented in Chipyard SoC☆49Jan 20, 2026Updated last month
- ☆18Feb 3, 2022Updated 4 years ago
- GPGPU-Sim 中文注释版代码,包含 GPGPU-Sim 模拟器的最新版代码,经过中文注释,以帮助中文用户更好地理解和使用该模拟器。☆28Dec 18, 2024Updated last year
- A Heterogeneous GPU Platform for Chipyard SoC☆44Updated this week
- The Next-gen Language & Compiler Powering Efficient Hardware Design☆36Jan 16, 2025Updated last year
- A simulator for SK hynix AiM PIM architecture based on Ramulator 2.0☆59Jul 22, 2025Updated 7 months ago
- Computing in memory optimizes data handling by performing operations directly in memory, ideal for high-speed data processing needs. This…☆31Nov 22, 2024Updated last year
- BTOR2 MLIR project☆26Jan 17, 2024Updated 2 years ago
- LLM is as good as you are☆45Feb 22, 2026Updated last week
- Open source RTL implementation of Tensor Core, Sparse Tensor Core, BitWave and SparSynergy in the article: "SparSynergy: Unlocking Flexib…☆22Mar 29, 2025Updated 11 months ago
- Open-source Framework for HPCA2024 paper: Gemini: Mapping and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators☆110Apr 28, 2025Updated 10 months ago
- tpu-systolic-array-weight-stationary☆25May 7, 2021Updated 4 years ago
- [DATE'2025, TCAD'2025] Terafly : A Multi-Node FPGA Based Accelerator Design for Efficient Cooperative Inference in LLMs☆28Nov 13, 2025Updated 3 months ago
- RTL implementation of a ray-tracing GPU☆15Dec 18, 2012Updated 13 years ago
- Perceptron-based branch predictor written in C++☆12Dec 14, 2016Updated 9 years ago
- A docker image for One Student One Chip's debug exam☆10Sep 22, 2023Updated 2 years ago
- An efficient spatial accelerator enabling hybrid sparse attention mechanisms for long sequences☆31Mar 7, 2024Updated last year
- The framework for the paper "Inter-layer Scheduling Space Definition and Exploration for Tiled Accelerators" in ISCA 2023.☆82Mar 12, 2025Updated 11 months ago
- FPGA 2025 SAT Accel: A modern SAT Solver on FPGA Repository☆14Mar 13, 2025Updated 11 months ago
- (Verilog) A simple convolution layer implementation with systolic array structure☆13May 9, 2022Updated 3 years ago
- ☆15Oct 20, 2025Updated 4 months ago
- Research about dataflow architecture☆12Nov 30, 2023Updated 2 years ago
- RISC-V-based many-core neuromorphic architecture☆15Aug 3, 2025Updated 7 months ago
- The wafer-native AI accelerator simulation platform and inference engine.☆50Jan 1, 2026Updated 2 months ago
- ☆17Mar 8, 2025Updated 11 months ago
- for paper @ ASPLOS‘25’☆17Mar 27, 2025Updated 11 months ago
- Artifact for "DX100: A Programmable Data Access Accelerator for Indirection (ISCA 2025)" paper☆17Nov 6, 2025Updated 3 months ago
- A Framework for Hardware-Aware LLM Exploration☆37Updated this week
- ☆33Nov 6, 2024Updated last year
- Artifact for paper "PIM is All You Need: A CXL-Enabled GPU-Free System for LLM Inference", ASPLOS 2025☆125May 3, 2025Updated 10 months ago
- IC implementation of Systolic Array for TPU☆339Oct 21, 2024Updated last year
- ☆17Oct 7, 2025Updated 4 months ago
- ☆224Oct 24, 2025Updated 4 months ago
- An open-source benchmark for generating design RTL with natural language☆160Nov 8, 2024Updated last year
- ☆38Dec 28, 2023Updated 2 years ago
- Some Hardware Architectures for GEMM☆289May 22, 2025Updated 9 months ago
- Berkeley's Spatial Array Generator☆1,225Updated this week
- DATE'24 paper: "Hierarchical Source-to-Post-Route QoR Prediction in High-Level Synthesis with GNNs"☆19Dec 10, 2024Updated last year
- A small Neural Network Processor for Edge devices.☆16Nov 22, 2022Updated 3 years ago