MICRO 2024 Evaluation Artifact for FuseMax
☆16Aug 26, 2024Updated last year
Alternatives and similar repositories for micro24-fusemax-artifact
Users that are interested in micro24-fusemax-artifact are comparing it to the libraries listed below
Sorting:
- ☆17Mar 8, 2025Updated last year
- MICRO 2023 Evaluation Artifact for TeAAL☆10Oct 26, 2023Updated 2 years ago
- Open source RTL implementation of Tensor Core, Sparse Tensor Core, BitWave and SparSynergy in the article: "SparSynergy: Unlocking Flexib…☆22Mar 29, 2025Updated 11 months ago
- ☆13May 8, 2025Updated 10 months ago
- Artifact for "DX100: A Programmable Data Access Accelerator for Indirection (ISCA 2025)" paper☆17Nov 6, 2025Updated 4 months ago
- CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark☆34Jun 24, 2025Updated 8 months ago
- ☆17Oct 7, 2025Updated 5 months ago
- ☆17Mar 26, 2025Updated 11 months ago
- Implementation of Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning in Chisel HDL. To know more, …☆17Oct 9, 2021Updated 4 years ago
- the GPU implementation of bucket based farthest point sampling, achieves 3-4x speedup than the conventional implementation☆21Aug 16, 2023Updated 2 years ago
- Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)☆31Jul 4, 2024Updated last year
- [FPL'24] This repository contains the source code for the paper “Revealing Untapped DSP Optimization Potentials for FPGA-based Systolic M…☆21May 6, 2024Updated last year
- ☆19Jan 2, 2026Updated 2 months ago
- StateMover is a checkpoint-based debugging framework for FPGAs.☆22Jul 14, 2022Updated 3 years ago
- All the tools you need to reproduce the CellIFT paper experiments☆24Feb 11, 2025Updated last year
- This is the open-source version of TinyTS. The code is dirty so far. We may clean the code in the future.☆19Aug 11, 2025Updated 6 months ago
- FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration☆20Jun 27, 2025Updated 8 months ago
- LLM Inference with Microscaling Format☆34Nov 12, 2024Updated last year
- the CPU implementation of bucket based farthest point sampling, achieves 7-81x speedup than the conventional implementation☆26Sep 17, 2023Updated 2 years ago
- mNPUsim: A Cycle-accurate Multi-core NPU Simulator (IISWC 2023)☆72Dec 29, 2025Updated 2 months ago
- [ICML 2024] Sparse Model Inversion: Efficient Inversion of Vision Transformers with Less Hallucination☆13Apr 29, 2025Updated 10 months ago
- PyTorchSim is a Comprehensive, Fast, and Accurate NPU Simulation Framework☆93Updated this week
- HyFiSS: A Hybrid Fidelity Stall-Aware Simulator for GPGPUs☆39Dec 9, 2024Updated last year
- Luthier, a GPU binary instrumentation tool for AMD GPUs☆27Updated this week
- ☆143Jul 19, 2025Updated 7 months ago
- H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference☆91Apr 26, 2025Updated 10 months ago
- Error-free transformations are used to get results with extra accuracy.☆15Jan 20, 2025Updated last year
- [ICLR 2026] FastCar☆16May 22, 2025Updated 9 months ago
- A parser for PTX 6.5☆13Jun 19, 2023Updated 2 years ago
- Accelerator Zoo☆20Oct 14, 2025Updated 4 months ago
- [CVPR 2025 Highlight] FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation☆26Jun 16, 2025Updated 8 months ago
- ☆10Apr 24, 2024Updated last year
- ☆13Jan 16, 2026Updated last month
- ☆78Aug 29, 2025Updated 6 months ago
- Official Implementation of Robustifying and Boosting Training-Free Neural Architecture Search☆10Mar 12, 2024Updated last year
- ☆14Oct 30, 2024Updated last year
- The official implementation of Bi-Mamba☆14Oct 22, 2025Updated 4 months ago
- Boosted E-Graph Extraction with Adaptive Heuristics and Exact Solving☆29Jan 7, 2026Updated 2 months ago
- ☆11Dec 23, 2025Updated 2 months ago