Qualcomm-AI-research / llm-surgeonLinks

☆33

Alternatives and similar repositories for llm-surgeon

Users that are interested in llm-surgeon are comparing it to the libraries listed below

Sorting:

locuslab / massive-activations
Code accompanying the paper "Massive Activations in Large Language Models"
☆184Updated last year
JeanKaddour / NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆80Updated 2 years ago
yxli2123 / LoSparse
☆61Updated 2 years ago
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆55Updated 2 years ago
jiwonsong-dev / SLEB
[ICML 2024] Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
☆37Updated 8 months ago
mengxiayu / LLMSuperWeight
Code for studying the super weight in LLM
☆119Updated 10 months ago
MadryLab / DsDm
☆51Updated last year
hdong920 / GRIFFIN
☆38Updated last year
hahnyuan / ASVD4LLM
Activation-aware Singular Value Decomposition for Compressing Large Language Models
☆80Updated last year
andyjm3 / SLTrain
SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)
☆35Updated last year
UNITES-Lab / MC-SMoE
[ICLR‘24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"
☆97Updated 4 months ago
IST-DASLab / RoSA
Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)
☆44Updated last year
htqin / IR-QLoRA
[ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…
☆67Updated last year
song-wx / SIFT
[ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely
☆22Updated last year
minyoungg / LTE
☆69Updated last year
CASE-Lab-UMD / Unified-MoE-Compression
The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".
☆78Updated 7 months ago
VITA-Group / Junk_DNA_Hypothesis
[ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…
☆16Updated 6 months ago
pprp / Pruner-Zero
[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
☆94Updated 11 months ago
TianjinYellow / EdgeDeviceLLMCompetition-Starting-Kit
☆43Updated last year
r-three / smear
☆30Updated 2 years ago
RobertCsordas / moeut
☆86Updated last year
Lucky-Lance / Expert_Sparsity
[ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
☆106Updated last year
QingruZhang / PLATON
This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).
☆46Updated 3 years ago
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆55Updated 8 months ago
VijayLingam95 / SVFT
☆33Updated 8 months ago
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆84Updated last year
CASE-Lab-UMD / LLM-Drop
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
☆179Updated 7 months ago
epfml / llm-baselines
nanoGPT-like codebase for LLM training
☆110Updated this week
hmarkc / parallel-prompt-decoding
Efficient LLM Inference Acceleration using Prompting
☆50Updated last year
princeton-nlp / Edge-Pruning
[NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".
☆61Updated 2 months ago