shaoyiHusky / SparseProgressiveDistillationLinks

☆12

Alternatives and similar repositories for SparseProgressiveDistillation

Users that are interested in SparseProgressiveDistillation are comparing it to the libraries listed below

Sorting:

SimiaoZuo / MoEBERT
This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).
☆108Updated 3 years ago
kssteven418 / BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
☆93Updated last year
ChandlerGuan / Transkimmer
Code for ACL2022 publication Transkimmer: Transformer Learns to Layer-wise Skim
☆21Updated 2 years ago
yifanycc / loretta
[NAACL 24 Oral] LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models
☆36Updated 6 months ago
WoosukKwon / retraining-free-pruning
[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers
☆191Updated 2 years ago
QingruZhang / PLATON
This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).
☆46Updated 2 years ago
thunlp / MoEfication
☆139Updated last year
henryzhongsc / longctx_bench
Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…
☆81Updated 4 months ago
camlsys / degree-quant
ICLR 2021
☆48Updated 4 years ago
liyunqianggyn / Awesome-LLMs-Pruning
Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.
☆104Updated this week
CASE-Lab-UMD / Unified-MoE-Compression
The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".
☆72Updated 4 months ago
yxli2123 / LoSparse
☆59Updated last year
raymin0223 / fast_robust_early_exit
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)
☆60Updated 9 months ago
hmarkc / parallel-prompt-decoding
Efficient LLM Inference Acceleration using Prompting
☆48Updated 9 months ago
ruikangliu / IntactKV
[ACL 2024] Official PyTorch implementation of "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact"
☆44Updated last year
DerrickYLJ / TidalDecode
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆42Updated 3 months ago
hdong920 / GRIFFIN
☆38Updated 10 months ago
twinkle0331 / Xcompression
[ICLR 2022] Code for paper "Exploring Extreme Parameter Compression for Pre-trained Language Models"(https://arxiv.org/abs/2205.10036)
☆22Updated 2 years ago
Adaxry / Unified_Layer_Skipping
☆14Updated last year
CASIA-IVA-Lab / FLAP
[AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models
☆57Updated last year
abdelfattah-lab / xKV
xKV: Cross-Layer SVD for KV-Cache Compression
☆27Updated last month
SalesforceAIResearch / ThinK
ThinK: Thinner Key Cache by Query-Driven Pruning
☆21Updated 5 months ago
shankarp8 / knowledge_distillation
Repository for "Propagating Knowledge Updates to LMs Through Distillation" (NeurIPS 2023).
☆26Updated 11 months ago
NJUNLP / MCSD
Multi-Candidate Speculative Decoding
☆35Updated last year
YouAreSpecialToMe / QST
Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models
☆46Updated 8 months ago
cornell-zhang / llm-datatypes
Codebase for ICML'24 paper: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs
☆27Updated last year
Glaciohound / LM-Infinite
Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
☆148Updated 4 months ago
mitchellgordon95 / bert-prune
☆17Updated 5 years ago
pprp / Awesome-LLM-Prune
Awesome list for LLM pruning.
☆245Updated 7 months ago
dilab-zju / self-speculative-decoding
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
☆195Updated 5 months ago