shaoyiHusky / SparseProgressiveDistillationLinks
☆12Updated last year
Alternatives and similar repositories for SparseProgressiveDistillation
Users that are interested in SparseProgressiveDistillation are comparing it to the libraries listed below
Sorting:
- This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).☆108Updated 3 years ago
- [NeurIPS'23] Speculative Decoding with Big Little Decoder☆93Updated last year
- Code for ACL2022 publication Transkimmer: Transformer Learns to Layer-wise Skim☆21Updated 2 years ago
- [NAACL 24 Oral] LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models☆36Updated 6 months ago
- [NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers☆191Updated 2 years ago
- This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).☆46Updated 2 years ago
- ☆139Updated last year
- Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…☆81Updated 4 months ago
- ICLR 2021☆48Updated 4 years ago
- Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.☆104Updated this week
- The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".☆72Updated 4 months ago
- ☆59Updated last year
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆60Updated 9 months ago
- Efficient LLM Inference Acceleration using Prompting☆48Updated 9 months ago
- [ACL 2024] Official PyTorch implementation of "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact"☆44Updated last year
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆42Updated 3 months ago
- ☆38Updated 10 months ago
- [ICLR 2022] Code for paper "Exploring Extreme Parameter Compression for Pre-trained Language Models"(https://arxiv.org/abs/2205.10036)☆22Updated 2 years ago
- ☆14Updated last year
- [AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models☆57Updated last year
- xKV: Cross-Layer SVD for KV-Cache Compression☆27Updated last month
- ThinK: Thinner Key Cache by Query-Driven Pruning☆21Updated 5 months ago
- Repository for "Propagating Knowledge Updates to LMs Through Distillation" (NeurIPS 2023).☆26Updated 11 months ago
- Multi-Candidate Speculative Decoding☆35Updated last year
- Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models☆46Updated 8 months ago
- Codebase for ICML'24 paper: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs☆27Updated last year
- Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆148Updated 4 months ago
- ☆17Updated 5 years ago
- Awesome list for LLM pruning.☆245Updated 7 months ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆195Updated 5 months ago