ldery / BonsaiLinks

Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"

☆28

Alternatives and similar repositories for Bonsai

Users that are interested in Bonsai are comparing it to the libraries listed below

Sorting:

PKU-ML / LongPPL
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
☆102Updated last week
thu-ml / ReMoE
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
☆97Updated 10 months ago
qiuzh20 / gated_attention
The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink…
☆95Updated last month
sail-sg / scaling-with-vocab
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
☆88Updated last year
ylsung / rsq
Code for "RSQ: Learning from Important Tokens Leads to Better Quantized LLMs"
☆19Updated 4 months ago
Leooyii / LCEG
Long Context Extension and Generalization in LLMs
☆62Updated last year
nbasyl / DoRA
Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"
☆124Updated last year
UNITES-Lab / MC-SMoE
[ICLR‘24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"
☆96Updated 4 months ago
Infini-AI-Lab / Kinetics
Kinetics: Rethinking Test-Time Scaling Laws
☆81Updated 3 months ago
SalesforceAIResearch / GemFilter
☆85Updated 9 months ago
DAMO-NLP-SG / CLEX
[ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models
☆78Updated last year
teelinsan / parallel-decoding
Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"
☆120Updated last year
OpenSparseLLMs / MoM
☆104Updated last month
IST-DASLab / DarwinLM
Official Pytorch Implementation of Paper "DarwinLM: Evolutionary Structured Pruning of Large Language Models"
☆18Updated 8 months ago
shaochenze / PatchTrain
Code for paper "Patch-Level Training for Large Language Models"
☆88Updated 11 months ago
VILA-Lab / GBLM-Pruner
Are gradient information useful for pruning of LLMs?
☆47Updated 2 months ago
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆55Updated 2 years ago
abdelfattah-lab / TokenButler
☆25Updated 2 months ago
song-wx / SIFT
[ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely
☆22Updated last year
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆55Updated 8 months ago
Infini-AI-Lab / S2FT
☆19Updated 9 months ago
hdong920 / GRIFFIN
☆38Updated last year
DRSY / KV_Compression
[EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokens
☆25Updated last year
SempraETY / Pruning-via-Merging
☆20Updated 10 months ago
locuslab / massive-activations
Code accompanying the paper "Massive Activations in Large Language Models"
☆184Updated last year
thu-coai / MiniPLM
[ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models
☆61Updated 11 months ago
VITA-Group / Ms-PoE
"Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiw…
☆30Updated last year
r-three / smear
☆30Updated 2 years ago
jiwonsong-dev / SLEB
[ICML 2024] Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
☆37Updated 8 months ago
pprp / Pruner-Zero
[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
☆94Updated 11 months ago