SamsungLabs / PMPDLinks
Codebase for the Progressive Mixed-Precision Decoding paper.
☆18Updated 4 months ago
Alternatives and similar repositories for PMPD
Users that are interested in PMPD are comparing it to the libraries listed below
Sorting:
- ☆113Updated 2 years ago
- Official implementation of EMNLP'23 paper "Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?"☆24Updated 2 years ago
- Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)☆23Updated last year
- Simulator for BitFusion☆102Updated 5 years ago
- BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization (ICLR 2021)☆42Updated 4 years ago
- ☆32Updated last week
- Training with Block Minifloat number representation☆17Updated 4 years ago
- [ECCV 2024] CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs☆15Updated last year
- ☆35Updated 5 years ago
- ☆80Updated last year
- ☆26Updated last month
- ☆15Updated last year
- Implementation of Microscaling data formats in SystemVerilog.☆28Updated 5 months ago
- The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer L…☆49Updated 3 years ago
- TQT's pytorch implementation.☆21Updated 3 years ago
- This repo contains the code for studying the interplay between quantization and sparsity methods☆24Updated 9 months ago
- [HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning☆114Updated last year
- Adaptive floating-point based numerical format for resilient deep learning☆14Updated 3 years ago
- ☆209Updated last month
- [ICASSP'20] DNN-Chip Predictor: An Analytical Performance Predictor for DNN Accelerators with Various Dataflows and Hardware Architecture…☆25Updated 3 years ago
- ☆19Updated 3 years ago
- [TCAD'23] AccelTran: A Sparsity-Aware Accelerator for Transformers☆54Updated 2 years ago
- Torch2Chip (MLSys, 2024)☆54Updated 8 months ago
- [HPCA 2023] ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design☆123Updated 2 years ago
- Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts☆129Updated last year
- ☆78Updated 3 years ago
- ☆19Updated 4 years ago
- GoldenEye is a functional simulator with fault injection capabilities for common and emerging numerical formats, implemented for the PyTo…☆26Updated last year
- DeiT implementation for Q-ViT☆25Updated 7 months ago
- Implementation of "NITI: Training Integer Neural Networks Using Integer-only Arithmetic" on arxiv☆86Updated 3 years ago