SalesforceAIResearch / ThinKLinks

ThinK: Thinner Key Cache by Query-Driven Pruning

☆24

Alternatives and similar repositories for ThinK

Users that are interested in ThinK are comparing it to the libraries listed below

Sorting:

SempraETY / Pruning-via-Merging
☆20Updated 10 months ago
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆84Updated last week
osehmathias / lisa
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
☆35Updated last year
sail-sg / LongSpec
LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
☆64Updated 3 months ago
thunlp / MoEfication
☆140Updated last year
Lucky-Lance / Expert_Sparsity
[ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
☆105Updated last year
yule-BUAA / MergeLLM
Codes for Merging Large Language Models
☆33Updated last year
zyxxmu / DSnoT
Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLM…
☆50Updated last year
UNITES-Lab / MC-SMoE
[ICLR‘24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"
☆95Updated 3 months ago
UCSB-NLP-Chang / ThinkPrune
☆44Updated 3 weeks ago
Lucky-Lance / SPP
[ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
☆21Updated last year
AIoT-MLSys-Lab / D2O
[ICLR 2025🔥] D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
☆23Updated 3 months ago
sail-sg / Attention-Sink
[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)
☆126Updated 3 months ago
qiuzh20 / gated_attention
The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink…
☆89Updated last month
CASIA-IVA-Lab / FLAP
[AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models
☆61Updated last year
hdong920 / GRIFFIN
☆38Updated last year
BaiTheBest / SparseLLM
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
☆66Updated 6 months ago
AkideLiu / MiniCache
☆10Updated last year
nightdessert / Retrieval_Head
open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality
☆215Updated last year
abdelfattah-lab / TokenButler
☆25Updated 2 months ago
VITA-Group / SEAL
Official code for SEAL: Steerable Reasoning Calibration of Large Language Models for Free
☆44Updated 6 months ago
OpenSparseLLMs / MoM
☆101Updated last month
mutonix / pyramidinfer
☆46Updated 10 months ago
locuslab / massive-activations
Code accompanying the paper "Massive Activations in Large Language Models"
☆184Updated last year
liyunqianggyn / Awesome-LLMs-Pruning
Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.
☆125Updated 2 months ago
aim-uofa / LoRAPrune
☆59Updated 10 months ago
QingruZhang / PLATON
This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).
☆46Updated 3 years ago
Kwai-Klear / RLEP
RL with Experience Replay
☆47Updated 2 months ago
alvin-zyl / CoLA
Implementation of CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation
☆23Updated 8 months ago
OpenSparseLLMs / LLaMA-MoE-v2
🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
☆87Updated 10 months ago