PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training
☆22Jun 12, 2024Updated last year
Alternatives and similar repositories for PALM
Users that are interested in PALM are comparing it to the libraries listed below
Sorting:
- The framework for the paper "Inter-layer Scheduling Space Definition and Exploration for Tiled Accelerators" in ISCA 2023.☆82Mar 12, 2025Updated last year
- This repository contains the code for this paper: Chiplet-Gym: An RL-based Optimization Framework for Chiplet-based AI Accelerator☆22Sep 28, 2024Updated last year
- ☆27Feb 27, 2025Updated last year
- ☆39Oct 14, 2025Updated 5 months ago
- A toolchain for rapid design space exploration of chiplet architectures☆75Jul 25, 2025Updated 7 months ago
- LLM Inference analyzer for different hardware platforms☆108Feb 17, 2026Updated last month
- ☆26Jan 30, 2026Updated last month
- A dynamic GPU memory allocator, suitable for warp synchronized scenarios.☆11Aug 20, 2019Updated 6 years ago
- Anatomy of a powerhouse: SystemVerilog TPU based on Google TPU v1☆20Nov 9, 2025Updated 4 months ago
- ViTALiTy (HPCA'23) Code Repository☆23Mar 13, 2023Updated 3 years ago
- This code base represents "faimGraph: High Performance Management of Fully-dynamic Graphs under tight Memory Constraints on the GPU"☆14Apr 23, 2021Updated 4 years ago
- Here are some implementations of basic hardware units in RTL language (verilog for now), which can be used for area/power evaluation and …☆14Aug 25, 2023Updated 2 years ago
- ☆11Nov 24, 2020Updated 5 years ago
- ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale☆533Mar 12, 2026Updated last week
- Stanford CS149 - Programming Assignment 5 (Extra Credit)☆15Dec 6, 2024Updated last year
- ETH Computer Architecture - Fall 2020☆13Feb 26, 2021Updated 5 years ago
- first-order deep learning accelerator model☆22Nov 27, 2017Updated 8 years ago
- BBO optimiser☆11Feb 11, 2020Updated 6 years ago
- Hardware Accelerated MWPM decoder for Quantum Error Correction☆19Mar 23, 2025Updated last year
- ☆14Oct 11, 2024Updated last year
- ☆10Sep 7, 2023Updated 2 years ago
- GPU topology-aware scheduler☆13Jul 7, 2017Updated 8 years ago
- A paper review list for computer architecture and systems research, maintained by the LEMONADE group at Peking University.☆16Updated this week
- Lab for Digital Design and Computer Architecture Spring 2022 (252-0028-00L) (ETH).☆14Mar 1, 2023Updated 3 years ago
- Slowdown prediction module of Echo: Simulating Distributed Training at Scale☆13May 17, 2025Updated 10 months ago
- A Parallel Simulation Framework For Multicore Systems☆10May 20, 2017Updated 8 years ago
- ☆12Jan 13, 2023Updated 3 years ago
- Attentionlego☆13Jan 24, 2024Updated 2 years ago
- This repository contains the cuStinger data structure used for dynamic graph representation.☆20Jan 11, 2019Updated 7 years ago
- ☆34Oct 21, 2025Updated 5 months ago
- [ICML 2024] Sparse Model Inversion: Efficient Inversion of Vision Transformers with Less Hallucination☆14Apr 29, 2025Updated 10 months ago
- LaTeX template for dissertation proposals in Peking University Shenzhen.☆15Feb 23, 2022Updated 4 years ago
- A 2-Way Super-Scalar OoO RISC-V Core Based on Intel P6 Microarchitecture.☆16Sep 27, 2022Updated 3 years ago
- HISIM introduces a suite of analytical models at the system level to speed up performance prediction for AI models, covering logic-on-log…☆64Mar 17, 2025Updated last year
- VLSI placement and routing tool☆15Dec 20, 2025Updated 3 months ago
- Single Cycle and Pipeline CPU of RISC-V Architecture designed for Digital Design and Computer Organization Experiments 2021, NJU☆14Jan 17, 2022Updated 4 years ago
- A Multi-objective Multi-fidelity acquisition function for Bayesian optimization based on EHVI method.☆14May 18, 2022Updated 3 years ago
- NSCSCC 2023 The Second Prize. TEAM PUA FROM HDU.☆13Mar 29, 2025Updated 11 months ago
- Intelligent Resource Requirement Estimation and Scheduling for Deep Learning Jobs on Distributed GPU Clusters☆15Nov 18, 2021Updated 4 years ago