ChengZhang-98 / llm-mixed-qLinks

Official implementation of EMNLP'23 paper "Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?"

☆23

Alternatives and similar repositories for llm-mixed-q

Users that are interested in llm-mixed-q are comparing it to the libraries listed below

Sorting:

clevercool / ANT-Quantization
☆111Updated last year
jeffreyyu0602 / quantized-training
☆32Updated this week
abdelfattah-lab / BitMoD-HPCA-25
☆51Updated 3 months ago
ebby-s / MX-for-FPGA
Implementation of Microscaling data formats in SystemVerilog.
☆26Updated 3 months ago
hsharma35 / bitfusion
Simulator for BitFusion
☆102Updated 5 years ago
snu-comparch / Tender
Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)
☆21Updated last year
pku-liang / Sanger
A co-design architecture on sparse attention
☆53Updated 4 years ago
Accelergy-Project / micro22-sparseloop-artifact
MICRO22 artifact evaluation for Sparseloop
☆44Updated 3 years ago
GATECH-EIC / ViTCoD
[HPCA 2023] ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
☆122Updated 2 years ago
hatsu3 / Sanger
☆48Updated 4 years ago
jha-lab / acceltran
[TCAD'23] AccelTran: A Sparsity-Aware Accelerator for Transformers
☆52Updated last year
sjtu-zhao-lab / SALO
An efficient spatial accelerator enabling hybrid sparse attention mechanisms for long sequences
☆29Updated last year
georgia-tech-synergy-lab / SIGMA
RTL implementation of Flex-DPE.
☆113Updated 5 years ago
isakedo / DNNsim
☆35Updated 5 years ago
mit-han-lab / spatten
[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
☆112Updated last year
maestro-project / gamma
☆41Updated last year
aojunzz / DominoSearch
☆19Updated 3 years ago
yanghr / BSQ
BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization (ICLR 2021)
☆41Updated 4 years ago
cornell-zhang / FracBNN
FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations
☆94Updated 4 years ago
Zhu-Zixuan / Bitlet-PE
A bit-level sparsity-awared multiply-accumulate process element.
☆17Updated last year
arc-research-lab / SSR
SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration (Full Paper Accepted in FPGA'24)
☆33Updated this week
Zhaoshixin-sky / CIM-MLC
[ASPLOS 2024] CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators
☆48Updated last year
PrincetonUniversity / LLMCompass
☆196Updated last week
KULeuven-MICAS / DeFiNES
A framework for fast exploration of the depth-first scheduling space for DNN accelerators
☆40Updated 2 years ago
KULeuven-MICAS / stream
Multi-core HW accelerator mapping optimization framework for layer-fused ML workloads.
☆61Updated 3 months ago
Accelergy-Project / accelergy
Accelergy is an energy estimation infrastructure for accelerator energy estimations
☆150Updated 5 months ago
SFU-HiAccel / HiSpMV
[TRETS 2025][FPGA 2024] FPGA Accelerator for Imbalanced SpMV using HLS
☆15Updated 2 months ago
chiragsakhuja / spotlight
☆16Updated 2 years ago
SeoLabCornell / torch2chip
Torch2Chip (MLSys, 2024)
☆54Updated 6 months ago
dimdano / adapt
Fast Emulation of Approximate DNN Accelerators in PyTorch
☆26Updated last year