clevercool / SQuantLinks

SQuant [ICLR22]

☆129

Alternatives and similar repositories for SQuant

Users that are interested in SQuant are comparing it to the libraries listed below

Sorting:

clevercool / TileSparsity
☆103Updated 4 years ago
fxmeng / Pruning-Filter-in-Filter
Pruning Filter in Filter(NeurIPS2020)
☆148Updated last year
bytedance / ABQ-LLM
An acceleration library that supports arbitrary bit-width combinatorial quantization operations
☆238Updated last year
Qcompiler / MIXQ
MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction
☆94Updated last year
ZLkanyo009 / MQBench_Quantize
QAT(quantize aware training) for classification with MQBench
☆28Updated 4 years ago
OPTML-Group / BiP
[NeurIPS22] "Advancing Model Pruning via Bi-level Optimization" by Yihua Zhang*, Yuguang Yao*, Parikshit Ram, Pu Zhao, Tianlong Chen, Min…
☆117Updated 2 years ago
Qcompiler / vllm-mixed-precision
Support mixed-precsion inference with vllm
☆84Updated 4 months ago
Ptolemy-DL / Ptolemy
☆95Updated 4 years ago
xiexi51 / ICCAD-Accel-GCN
Official Implementation of "Accel-GNN: High-Performance GPU Accelerator Design for Graph Neural Networks"
☆51Updated 8 months ago
Qcompiler / MixQ_Tensorrt_LLM
Mixed precision inference by Tensorrt-LLM
☆80Updated last year
xiexi51 / MaxK-GNN
Official implementation of "MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training"
☆40Updated last year
AIoT-MLSys-Lab / SVD-LLM
[ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2
☆261Updated 2 months ago
liuzuyan / ElasticCache
[ECCV 2024] Efficient Inference of Vision Instruction-Following Models with Elastic Cache
☆42Updated last year
ByteDance-Seed / SDP4Bit
official implementation of paper SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
☆41Updated 11 months ago
ByteDance-Seed / ShadowKV
[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
☆272Updated 6 months ago
Phoenix8215 / BuildCudaNeuralNetworkFromScratch
Build CUDA Neural Network From Scratch
☆21Updated last year
Jianf-Wang / RSG
A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"
☆107Updated 3 years ago
peiswang / BitSplit
BitSplit Post-trining Quantization
☆50Updated 3 years ago
xytpai / kfunca
KFunca: A minimalist, high-performance GPU-based automatic differentiation framework
☆28Updated 3 months ago
snap-research / F8Net
[ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization
☆94Updated 3 years ago
Phoenix8215 / torchstat2
Model analyzer in PyTorch
☆90Updated 2 months ago
yuleung / FPPQ
Implementation of NIPS2023: Unleashing the Full Potential of Product Quantization for Large-Scale Image Retrieva
☆11Updated last year
CR400AF-A / SparseMM
[ICCV 2025] SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
☆78Updated last month
papers-submission / structured_transposable_masks
Code for ICML 2021 submission
☆35Updated 4 years ago
BillAmihom / RAPQ
Pytorch implementation of RAPQ, IJCAI 2022
☆23Updated 2 years ago
Ledzy / StreamBP
Official code of "StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs".
☆73Updated 4 months ago
abdelfattah-lab / BitMoD-HPCA-25
☆76Updated 4 months ago
Zhen-Dong / BitPack
BitPack is a practical tool to efficiently save ultra-low precision/mixed-precision quantized models.
☆58Updated 2 years ago
ziplab / QTool
Collections of model quantization algorithms. Any issues, please contact Peng Chen (blueardour@gmail.com)
☆72Updated 4 years ago
Jianf-Wang / NP-Match
A Pytorch implementation of ICML 2022 paper "NP-Match: When Neural Processes meet Semi-Supervised Learning"
☆96Updated 2 years ago