FSA: Fusing FlashAttention within a Single Systolic Array
☆112Apr 15, 2026Updated last month
Alternatives and similar repositories for FSA
Users that are interested in FSA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Cluster-level matrix unit integration into GPUs, implemented in Chipyard SoC☆56Jan 20, 2026Updated 4 months ago
- A Heterogeneous GPU Platform for AI and Neural Graphics☆56May 6, 2026Updated 2 weeks ago
- ☆18Feb 3, 2022Updated 4 years ago
- GPGPU-Sim 中文注释版代码,包含 GPGPU-Sim 模拟器的最新版代码,经过中文注释,以帮助中文用户更好地理解和使用该模拟器。☆28Dec 18, 2024Updated last year
- Computing in memory optimizes data handling by performing operations directly in memory, ideal for high-speed data processing needs. This…☆35Nov 22, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- The Next-gen Language & Compiler Powering Efficient Hardware Design☆38Jan 16, 2025Updated last year
- The official NaplesPU hardware code repository☆24Jul 27, 2019Updated 6 years ago
- Open source RTL implementation of Tensor Core, Sparse Tensor Core, BitWave and SparSynergy in the article: "SparSynergy: Unlocking Flexib…☆25Mar 29, 2025Updated last year
- Research about dataflow architecture☆14Nov 30, 2023Updated 2 years ago
- RTL implementation of a ray-tracing GPU☆16Dec 18, 2012Updated 13 years ago
- for paper @ ASPLOS‘25’☆16Mar 27, 2025Updated last year
- A simulator for SK hynix AiM PIM architecture based on Ramulator 2.0☆64Jul 22, 2025Updated 10 months ago
- tpu-systolic-array-weight-stationary☆25May 7, 2021Updated 5 years ago
- The official website of One Student One Chip project.☆12Feb 5, 2026Updated 3 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A Framework for Hardware-Aware LLM Exploration☆37Updated this week
- Perceptron-based branch predictor written in C++☆14Dec 14, 2016Updated 9 years ago
- A small Neural Network Processor for Edge devices.☆19Nov 22, 2022Updated 3 years ago
- [DATE'2025, TCAD'2025] Terafly : A Multi-Node FPGA Based Accelerator Design for Efficient Cooperative Inference in LLMs