CASE-Lab-UMD / LLM-DropLinks

The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".

☆181

Alternatives and similar repositories for LLM-Drop

Users that are interested in LLM-Drop are comparing it to the libraries listed below

Sorting:

mengxiayu / LLMSuperWeight
Code for studying the super weight in LLM
☆121Updated last year
locuslab / massive-activations
Code accompanying the paper "Massive Activations in Large Language Models"
☆187Updated last year
FasterDecoding / BitDelta
☆204Updated last year
HanGuo97 / lq-lora
☆128Updated last year
prateeky2806 / ties-merging
☆200Updated last year
llm-merging / LLM-Merging
LLM-Merging: Building LLMs Efficiently through Merging
☆205Updated last year
astramind-ai / Mixture-of-depths
Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆175Updated last year
jongwooko / distillm
Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)
☆240Updated 8 months ago
lucidrains / CALM-pytorch
Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind
☆178Updated last year
nbasyl / DoRA
Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"
☆124Updated last year
lucidrains / speculative-decoding
Explorations into some recent techniques surrounding speculative decoding
☆295Updated 11 months ago
wuhy68 / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)
☆147Updated last year
yxli2123 / LoftQ
☆235Updated last year
arpita8 / Awesome-Mixture-of-Experts-Papers
Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.
☆139Updated last year
melisa-writer / short-transformers
Prune transformer layers
☆74Updated last year
facebookresearch / LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
☆349Updated 7 months ago
SalesforceAIResearch / GemFilter
☆85Updated 3 weeks ago
lucidrains / coconut-pytorch
Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch
☆180Updated 5 months ago
VITA-Group / WeLore
[ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications
☆51Updated last month
Cohere-Labs-Community / parameter-efficient-moe
☆272Updated 2 years ago
microsoft / rho
Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.
☆448Updated last year
jeffreysijuntan / lloco
The official repo for "LLoCo: Learning Long Contexts Offline"
☆118Updated last year
princeton-nlp / HELMET
The HELMET Benchmark
☆187Updated 3 months ago
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆162Updated 7 months ago
nikhilgsh / loraplus
☆229Updated last year
koayon / awesome-adaptive-computation
A curated reading list of research in Adaptive Computation, Inference-Time Computation & Mixture of Experts (MoE).
☆160Updated 11 months ago
nightdessert / Retrieval_Head
open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality
☆222Updated last year
jxiw / MambaInLlama
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
☆232Updated last month
jongwooko / distillm-2
Official PyTorch implementation of DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs (ICML 2025 Oral)
☆50Updated 5 months ago
TianjinYellow / EdgeDeviceLLMCompetition-Starting-Kit
☆43Updated last year