ldery / BonsaiLinks
Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"
β30Updated last year
Alternatives and similar repositories for Bonsai
Users that are interested in Bonsai are comparing it to the libraries listed below
Sorting:
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"β110Updated 3 months ago
- [NeurIPS-2024] π Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623β89Updated last year
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"β123Updated last year
- Code for paper "Patch-Level Training for Large Language Models"β97Updated 2 months ago
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.β105Updated last year
- Long Context Extension and Generalization in LLMsβ62Updated last year
- β85Updated 2 months ago
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Modelsβ58Updated 11 months ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retentiβ¦β67Updated last year
- Official Pytorch Implementation of Paper "DarwinLM: Evolutionary Structured Pruning of Large Language Models"β20Updated 11 months ago
- Are gradient information useful for pruning of LLMs?β47Updated 5 months ago
- β19Updated last year
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"β124Updated last year
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswalβ¦β56Updated 2 years ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Modelsβ78Updated last year
- Code for "RSQ: Learning from Important Tokens Leads to Better Quantized LLMs"β20Updated 7 months ago
- "Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiwβ¦β31Updated last year
- [ICLRβ24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"β103Updated 7 months ago
- β27Updated 2 months ago
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.β52Updated last year
- LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verificationβ73Updated 6 months ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":β44Updated last year
- FocusLLM: Scaling LLMβs Context by Parallel Decodingβ44Updated last year
- Kinetics: Rethinking Test-Time Scaling Lawsβ85Updated 6 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Schedulingβ42Updated last month
- Official implementation of the paper: "A deeper look at depth pruning of LLMs"β15Updated last year
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"β36Updated last year
- Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"β84Updated 2 years ago
- β23Updated last year
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)β65Updated last year