chandar-lab / EfficientLLMsLinks

☆17

Alternatives and similar repositories for EfficientLLMs

Users that are interested in EfficientLLMs are comparing it to the libraries listed below

Sorting:

yxli2123 / LoSparse
☆60Updated last year
WoosukKwon / retraining-free-pruning
[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers
☆192Updated 2 years ago
facebookresearch / LLM-QAT
Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"
☆314Updated 7 months ago
IST-DASLab / OBC
Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".
☆128Updated 2 years ago
Raincleared-Song / sparse_gpu_operator
GPU operators for sparse tensor operations
☆35Updated last year
kssteven418 / BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
☆94Updated last year
luuyin / OWL
Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"
☆72Updated 3 months ago
CASIA-IVA-Lab / FLAP
[AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models
☆61Updated last year
DD-DuDa / BitDistiller
[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
☆123Updated last year
BaiTheBest / SparseLLM
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
☆66Updated 6 months ago
kssteven418 / SqueezeLLM-gradients
☆20Updated last year
wimh966 / outlier_suppression
The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer L…
☆48Updated 3 years ago
haochengxi / Train_Transformers_with_INT4
☆156Updated 2 years ago
nbasyl / LLM-FP4
The official implementation of the EMNLP 2023 paper LLM-FP4
☆215Updated last year
htqin / BiBERT
This project is the official implementation of our accepted ICLR 2022 paper BiBERT: Accurate Fully Binarized BERT.
☆88Updated 2 years ago
xvyaward / owq
Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…
☆66Updated last year
kssteven418 / LTP
[KDD'22] Learned Token Pruning for Transformers
☆100Updated 2 years ago
liyunqianggyn / Awesome-LLMs-Pruning
Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.
☆123Updated 2 months ago
ScalingIntelligence / CATS
☆28Updated 11 months ago
FasterDecoding / TEAL
☆143Updated 7 months ago
jmluu / Awesome-Efficient-Training
A collection of research papers on efficient training of DNNs
☆69Updated 3 years ago
ChenMnZ / PrefixQuant
An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization
☆153Updated 4 months ago
efeslab / Atom
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
☆320Updated last year
thu-ml / TetraJet-MXFP4Training
Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training
☆30Updated 3 months ago
thu-nics / qllm-eval
Code Repository of Evaluating Quantized Large Language Models
☆132Updated last year
HazyResearch / fly
☆217Updated 2 years ago
raymin0223 / fast_robust_early_exit
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)
☆63Updated last year
facebookresearch / Ternary_Binary_Transformer
ACL 2023
☆39Updated 2 years ago
ModelTC / Outlier_Suppression_Plus
Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…
☆47Updated last year
SNU-ARC / any-precision-llm
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
☆115Updated 3 months ago