shaoyiHusky / SparseProgressiveDistillationLinks
☆12Updated 2 years ago
Alternatives and similar repositories for SparseProgressiveDistillation
Users that are interested in SparseProgressiveDistillation are comparing it to the libraries listed below
Sorting:
- [NeurIPS'23] Speculative Decoding with Big Little Decoder☆96Updated 2 years ago
- This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).☆113Updated 3 years ago
- This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).☆46Updated 3 years ago
- Code for ACL2022 publication Transkimmer: Transformer Learns to Layer-wise Skim☆22Updated 3 years ago
- [NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers☆192Updated 2 years ago
- ☆143Updated last year
- [KDD'22] Learned Token Pruning for Transformers☆102Updated 2 years ago
- [ICLR 2022] Code for paper "Exploring Extreme Parameter Compression for Pre-trained Language Models"(https://arxiv.org/abs/2205.10036)☆22Updated 2 years ago
- First Latency-Aware Competitive LLM Agent Benchmark☆26Updated 8 months ago
- Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)☆46Updated 2 years ago
- ☆63Updated 2 years ago
- ICLR 2021☆48Updated 4 years ago
- [NAACL 24 Oral] LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models☆39Updated last year
- Block Sparse movement pruning☆83Updated 5 years ago
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆65Updated last year
- ☆17Updated 5 years ago
- [ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408☆198Updated 2 years ago
- Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆152Updated 10 months ago
- A curated list of early exiting (LLM, CV, NLP, etc)☆70Updated last year
- ☆21Updated 2 years ago
- An implementation of the DISP-LLM method from the NeurIPS 2024 paper: Dimension-Independent Structural Pruning for Large Language Models.☆23Updated 6 months ago
- Efficient LLM Inference Acceleration using Prompting☆51Updated last year
- Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"☆80Updated 7 months ago
- MLPruning, PyTorch, NLP, BERT, Structured Pruning☆20Updated 4 years ago
- [ACL 2024] Official PyTorch implementation of "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact"☆48Updated last year
- Repository for "Propagating Knowledge Updates to LMs Through Distillation" (NeurIPS 2023).☆26Updated last year
- Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models☆49Updated last year
- A curated list of Early Exiting papers, benchmarks, and misc.☆120Updated 2 years ago
- Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its…☆21Updated last year
- AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning (Zhou et al.; TACL 2024)☆51Updated last year