shawnricecake / edge-qatLinks

Official Repo for EdgeQAT

☆15

Alternatives and similar repositories for edge-qat

Users that are interested in edge-qat are comparing it to the libraries listed below

Sorting:

htqin / IR-QLoRA
[ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…
☆65Updated last year
CASIA-IVA-Lab / FLAP
[AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models
☆54Updated last year
yxli2123 / LoSparse
☆57Updated last year
zyxxmu / DSnoT
Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLM…
☆47Updated last year
abdelfattah-lab / TokenButler
☆23Updated 2 months ago
yifanycc / loretta
[NAACL 24 Oral] LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models
☆35Updated 5 months ago
VILA-Lab / GBLM-Pruner
Are gradient information useful for pruning of LLMs?
☆46Updated last year
ldery / Bonsai
Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"
☆28Updated last year
HuangOwen / RoLoRA
[EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
☆37Updated 9 months ago
ylsung / ECoFLaP
Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)
☆19Updated last year
SempraETY / Pruning-via-Merging
☆18Updated 7 months ago
ruikangliu / IntactKV
[ACL 2024] Official PyTorch implementation of "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact"
☆44Updated last year
raymin0223 / fast_robust_early_exit
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)
☆60Updated 9 months ago
fmfi-compbio / admm-pruning
☆28Updated 11 months ago
Intelligent-Computing-Lab-Yale / TesseraQ
☆22Updated 7 months ago
LinkAnonymous / BESA
☆10Updated last year
DavidFanzz / SCMoE
☆26Updated last year
SqueezeAILab / SqueezedAttention
SQUEEZED ATTENTION: Accelerating Long Prompt LLM Inference
☆47Updated 7 months ago
wimh966 / outlier_suppression
The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer L…
☆47Updated 2 years ago
liyunqianggyn / Awesome-LLMs-Pruning
Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.
☆93Updated 6 months ago
lliai / D2MoE
D^2-MoE: Delta Decompression for MoE-based LLMs Compression
☆48Updated 3 months ago
BaohaoLiao / ApiQ
[EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMs
☆13Updated 11 months ago
jiwonsong-dev / SLEB
Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
☆37Updated 4 months ago
Qualcomm-AI-research / outlier-free-transformers
☆42Updated last year
MrGGLS / BlockPruner
A block pruning framework for LLMs.
☆23Updated last month
biomedical-cybernetics / Relative-importance-and-activation-pruning
☆46Updated last year
kssteven418 / SqueezeLLM-gradients
☆20Updated last year
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆52Updated 2 years ago
ruikangliu / Quantized-Reasoning-Models
Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"
☆36Updated 3 weeks ago
BaiTheBest / SparseLLM
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
☆62Updated 3 months ago