PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training
☆21Jun 12, 2024Updated last year
Alternatives and similar repositories for PALM
Users that are interested in PALM are comparing it to the libraries listed below
Sorting:
- The framework for the paper "Inter-layer Scheduling Space Definition and Exploration for Tiled Accelerators" in ISCA 2023.☆82Mar 12, 2025Updated 11 months ago
- ☆27Feb 27, 2025Updated last year
- ☆35Oct 14, 2025Updated 4 months ago
- This repository contains the code for this paper: Chiplet-Gym: An RL-based Optimization Framework for Chiplet-based AI Accelerator☆22Sep 28, 2024Updated last year
- ViTALiTy (HPCA'23) Code Repository☆23Mar 13, 2023Updated 2 years ago
- FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration☆20Jun 27, 2025Updated 8 months ago
- A toolchain for rapid design space exploration of chiplet architectures☆74Jul 25, 2025Updated 7 months ago
- [ICML 2024] Sparse Model Inversion: Efficient Inversion of Vision Transformers with Less Hallucination☆13Apr 29, 2025Updated 10 months ago
- ☆10Sep 7, 2023Updated 2 years ago
- LLM Inference analyzer for different hardware platforms☆101Feb 17, 2026Updated last week
- [CVPR 2025 Highlight] FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation☆26Jun 16, 2025Updated 8 months ago
- ☆10Apr 24, 2024Updated last year
- BBO optimiser☆11Feb 11, 2020Updated 6 years ago
- HISIM introduces a suite of analytical models at the system level to speed up performance prediction for AI models, covering logic-on-log…☆62Mar 17, 2025Updated 11 months ago
- A dynamic GPU memory allocator, suitable for warp synchronized scenarios.☆11Aug 20, 2019Updated 6 years ago
- ☆13Jul 14, 2025Updated 7 months ago
- Official Implementation of Robustifying and Boosting Training-Free Neural Architecture Search☆10Mar 12, 2024Updated last year
- ☆14Oct 11, 2024Updated last year
- The official code for "Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation" | [MM2…☆14Dec 7, 2024Updated last year
- ☆11Nov 24, 2020Updated 5 years ago
- ☆11Apr 5, 2023Updated 2 years ago
- ☆15Jan 12, 2026Updated last month
- [ICML 2025] MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design☆22Jul 4, 2025Updated 7 months ago
- Accelerate multihead attention transformer model using HLS for FPGA☆11Dec 7, 2023Updated 2 years ago
- Anatomy of a powerhouse: SystemVerilog TPU based on Google TPU v1☆20Nov 9, 2025Updated 3 months ago
- [ICML 2025] Official PyTorch implementation of "NegMerge: Sign-Consensual Weight Merging for Machine Unlearning"☆14Nov 25, 2025Updated 3 months ago
- Source code of our TNNLS paper "Boosting Convolutional Neural Networks with Middle Spectrum Grouped Convolution"☆12Apr 14, 2023Updated 2 years ago
- A Parallel Simulation Framework For Multicore Systems☆10May 20, 2017Updated 8 years ago
- Official implementation of "Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent".☆21May 23, 2025Updated 9 months ago
- [CVPR 2025] QuartDepth☆17Mar 24, 2025Updated 11 months ago
- Hardware Accelerated MWPM decoder for Quantum Error Correction☆18Mar 23, 2025Updated 11 months ago
- [ICCAD 2025] Squant☆15Jul 3, 2025Updated 8 months ago
- Official PyTorch implementation of the paper entitled 'Self Attentive Pooling for Efficient Deep Learning'.☆13May 3, 2024Updated last year
- ☆28Aug 4, 2025Updated 6 months ago
- ☆11Jun 28, 2020Updated 5 years ago
- Slowdown prediction module of Echo: Simulating Distributed Training at Scale☆13May 17, 2025Updated 9 months ago
- ☆20Dec 16, 2025Updated 2 months ago
- ☆13Jul 25, 2024Updated last year
- ETH Computer Architecture - Fall 2020☆12Feb 26, 2021Updated 5 years ago