princeton-nlp/LLM-Shearing

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/princeton-nlp/LLM-Shearing)

princeton-nlp / LLM-Shearing

[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

☆640

Alternatives and similar repositories for LLM-Shearing

Users that are interested in LLM-Shearing are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

horseee / LLM-Pruner
View on GitHub
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baich…
☆1,130Oct 7, 2024Updated last year
locuslab / wanda
View on GitHub
A simple and effective LLM pruning approach.
☆868Aug 9, 2024Updated last year
microsoft / TransformerCompression
View on GitHub
For releasing code related to compression methods for transformers, accompanying our publications
☆461Jan 16, 2025Updated last year
IST-DASLab / sparsegpt
View on GitHub
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
☆889Aug 20, 2024Updated last year
CASIA-LMC-Lab / FLAP
View on GitHub
[AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models
☆76Jan 6, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
hao-ai-lab / LookaheadDecoding
View on GitHub
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
☆1,340Mar 6, 2025Updated last year
arcee-ai / mergekit
View on GitHub
Tools for merging pretrained large language models.
☆7,246Jun 17, 2026Updated last month
princeton-nlp / MeZO
View on GitHub
[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333
☆1,168Jan 11, 2024Updated 2 years ago
pjlab-sys4nlp / llama-moe
View on GitHub
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
☆1,004Dec 6, 2024Updated last year
princeton-nlp / CoFiPruning
View on GitHub
[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
☆199May 9, 2023Updated 3 years ago
princeton-nlp / QuRating
View on GitHub
[ICML 2024] Selecting High-Quality Data for Training Language Models
☆204Dec 8, 2025Updated 7 months ago
yule-BUAA / MergeLM
View on GitHub
Codebase for Merging Language Models (ICML 2024)
☆869May 5, 2024Updated 2 years ago
FasterDecoding / Medusa
View on GitHub
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
☆2,757Jun 25, 2024Updated 2 years ago
yegcjs / mixinglaws
View on GitHub
☆113Jul 15, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
pprp / Awesome-LLM-Prune
View on GitHub
Awesome list for LLM pruning.
☆297Oct 11, 2025Updated 9 months ago
YuchuanTian / RethinkTinyLM
View on GitHub
[ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”
☆126Jan 14, 2025Updated last year
ldery / Bonsai
View on GitHub
Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"
☆32Mar 28, 2024Updated 2 years ago
EleutherAI / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of language models.
☆13,340Jul 13, 2026Updated last week
pratyushasharma / laser
View on GitHub
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
☆395Jul 9, 2024Updated 2 years ago
alibaba / Megatron-LLaMA
View on GitHub
Best practice for training LLaMA models in Megatron-LM
☆666Jan 2, 2024Updated 2 years ago
whyNLP / LCKV
View on GitHub
Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…
☆157Apr 7, 2025Updated last year
NVlabs / MaskLLM
View on GitHub
[NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models
☆189Jan 1, 2025Updated last year
NVlabs / Minitron
View on GitHub
A family of compressed models obtained via pruning and knowledge distillation
☆383Nov 6, 2025Updated 8 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
XueFuzhao / OpenMoE
View on GitHub
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models
☆1,691Mar 8, 2024Updated 2 years ago
jzhang38 / TinyLlama
View on GitHub
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
☆9,014May 3, 2024Updated 2 years ago
horseee / Awesome-Efficient-LLM
View on GitHub
A curated list for Efficient Large Language Models
☆2,023Jun 17, 2025Updated last year
microsoft / rho
View on GitHub
Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.
☆470Apr 18, 2024Updated 2 years ago
sangmichaelxie / doremi
View on GitHub
Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets
☆357Dec 26, 2023Updated 2 years ago
princeton-nlp / AutoCompressors
View on GitHub
[EMNLP 2023] Adapting Language Models to Compress Long Contexts
☆337Sep 9, 2024Updated last year
OpenGVLab / OmniQuant
View on GitHub
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
☆901Nov 26, 2025Updated 7 months ago
sramshetty / ShortGPT
View on GitHub
Unofficial implementations of block/layer-wise pruning methods for LLMs.
☆78Apr 29, 2024Updated 2 years ago
p-lambda / dsir
View on GitHub
DSIR large-scale data selection framework for language model training
☆275Apr 7, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
FMInference / DejaVu
View on GitHub
☆359Apr 2, 2024Updated 2 years ago
jquesnelle / yarn
View on GitHub
YaRN: Efficient Context Window Extension of Large Language Models
☆1,737Apr 17, 2024Updated 2 years ago
arcee-ai / PruneMe
View on GitHub
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
☆266Apr 23, 2024Updated 2 years ago
IBM / ModuleFormer
View on GitHub
ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…
☆225Sep 18, 2025Updated 10 months ago
FasterDecoding / REST
View on GitHub
REST: Retrieval-Based Speculative Decoding, NAACL 2024
☆220Mar 5, 2026Updated 4 months ago
uclaml / SPIN
View on GitHub
The official implementation of Self-Play Fine-Tuning (SPIN)
☆1,247May 8, 2024Updated 2 years ago
aim-uofa / LoRAPrune
View on GitHub
☆63Dec 15, 2024Updated last year