FSA: Fusing FlashAttention within a Single Systolic Array
☆99Apr 6, 2026Updated last week
Alternatives and similar repositories for FSA
Users that are interested in FSA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Cluster-level matrix unit integration into GPUs, implemented in Chipyard SoC☆53Jan 20, 2026Updated 2 months ago
- A Heterogeneous GPU Platform for Chipyard SoC☆50Apr 3, 2026Updated last week
- ☆18Feb 3, 2022Updated 4 years ago
- GPGPU-Sim 中文注释版代码,包含 GPGPU-Sim 模拟器的最新版代码,经过中文注释,以帮助中文用户更好地理解和使用该模拟器。☆26Dec 18, 2024Updated last year
- Computing in memory optimizes data handling by performing operations directly in memory, ideal for high-speed data processing needs. This…☆32Nov 22, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Research about dataflow architecture☆12Nov 30, 2023Updated 2 years ago
- The Next-gen Language & Compiler Powering Efficient Hardware Design☆36Jan 16, 2025Updated last year
- Open source RTL implementation of Tensor Core, Sparse Tensor Core, BitWave and SparSynergy in the article: "SparSynergy: Unlocking Flexib…☆23Mar 29, 2025Updated last year
- RTL implementation of a ray-tracing GPU☆15Dec 18, 2012Updated 13 years ago
- for paper @ ASPLOS‘25’☆16Mar 27, 2025Updated last year
- A simulator for SK hynix AiM PIM architecture based on Ramulator 2.0☆63Jul 22, 2025Updated 8 months ago
- tpu-systolic-array-weight-stationary☆25May 7, 2021Updated 4 years ago
- A small Neural Network Processor for Edge devices.☆18Nov 22, 2022Updated 3 years ago
- The official website of One Student One Chip project.☆12Feb 5, 2026Updated 2 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A Framework for Hardware-Aware LLM Exploration☆37Updated this week
- Perceptron-based branch predictor written in C++☆13Dec 14, 2016Updated 9 years ago
- [DATE'2025, TCAD'2025] Terafly : A Multi-Node FPGA Based Accelerator Design for Efficient Cooperative Inference in LLMs☆34Nov 13, 2025Updated 5 months ago
- This is a project created and completed by team BOOM(Beihang OO masters).This is a superscalar processor with a 13-stage out-of-order dua…☆18Sep 29, 2024Updated last year
- RISC-V-based many-core neuromorphic architecture☆16Updated this week
- Benchmark suite containing cache filtered traces for use with Ramulator. These include some of the workloads used in our SIGMETRICS 2019 …☆23Oct 9, 2020Updated 5 years ago
- Artifact for paper "PIM is All You Need: A CXL-Enabled GPU-Free System for LLM Inference", ASPLOS 2025☆129May 3, 2025Updated 11 months ago
- (Verilog) A simple convolution layer implementation with systolic array structure☆13May 9, 2022Updated 3 years ago
- The wafer-native AI accelerator simulation platform and inference engine.☆53Jan 1, 2026Updated 3 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- An efficient spatial accelerator enabling hybrid sparse attention mechanisms for long sequences☆32Mar 7, 2024Updated 2 years ago
- Berkeley's Spatial Array Generator☆1,270Mar 29, 2026Updated 2 weeks ago
- Ramulator 2.0 is a modern, modular, extensible, and fast cycle-accurate DRAM simulator. It provides support for agile implementation and …☆530Mar 30, 2026Updated 2 weeks ago
- An MLIR dialect to enable the efficient acceleration of ML model on CGRAs.☆65Oct 9, 2024Updated last year
- BTOR2 MLIR project☆26Jan 17, 2024Updated 2 years ago
- Open-source Framework for HPCA2024 paper: Gemini: Mapping and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators☆112Apr 28, 2025Updated 11 months ago
- ☆16Oct 20, 2025Updated 5 months ago
- ☆23Mar 15, 2023Updated 3 years ago
- H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference☆97Apr 26, 2025Updated 11 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The framework for the paper "Inter-layer Scheduling Space Definition and Exploration for Tiled Accelerators" in ISCA 2023.☆83Mar 12, 2025Updated last year
- Python wrapper for verilator model☆93Feb 10, 2024Updated 2 years ago
- C++ Code☆11Aug 13, 2019Updated 6 years ago
- An open-source benchmark for generating design RTL with natural language☆182Nov 8, 2024Updated last year
- [IJCAI 2024] QiMeng-CPU-v1: Automated CPU Design by Learning from Input-Output Examples☆28May 4, 2025Updated 11 months ago
- Some Hardware Architectures for GEMM☆289May 22, 2025Updated 10 months ago
- ☆17Mar 8, 2025Updated last year