shaoyiHusky / SparseProgressiveDistillationLinks
☆12Updated 2 years ago
Alternatives and similar repositories for SparseProgressiveDistillation
Users that are interested in SparseProgressiveDistillation are comparing it to the libraries listed below
Sorting:
- [NeurIPS'23] Speculative Decoding with Big Little Decoder☆95Updated last year
- This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).☆113Updated 3 years ago
- [NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers☆192Updated 2 years ago
- This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).☆46Updated 3 years ago
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888☆37Updated last year
- Code for ACL2022 publication Transkimmer: Transformer Learns to Layer-wise Skim☆22Updated 3 years ago
- [NAACL 24 Oral] LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models☆39Updated 11 months ago
- ☆142Updated last year
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆65Updated last year
- Efficient LLM Inference Acceleration using Prompting☆51Updated last year
- ☆62Updated 2 years ago
- A curated list of Early Exiting papers, benchmarks, and misc.☆119Updated 2 years ago
- A curated list of early exiting (LLM, CV, NLP, etc)☆69Updated last year
- [KDD'22] Learned Token Pruning for Transformers☆102Updated 2 years ago
- ☆39Updated last year
- ☆17Updated 5 years ago
- [ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408☆198Updated 2 years ago
- [AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models☆65Updated last year
- Multi-Candidate Speculative Decoding☆38Updated last year
- Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)☆46Updated 2 years ago
- First Latency-Aware Competitive LLM Agent Benchmark☆25Updated 6 months ago
- Official implementation for "Mixture of In-Context Experts Enhance LLMs’ Awareness of Long Contexts" (Accepted by Neurips2024)☆13Updated 11 months ago
- AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning (Zhou et al.; TACL 2024)☆50Updated last year
- Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.☆142Updated 4 months ago
- Repository for "Propagating Knowledge Updates to LMs Through Distillation" (NeurIPS 2023).☆26Updated last year
- The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".☆87Updated 9 months ago
- Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…☆87Updated 10 months ago
- An innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification.☆26Updated 8 months ago
- [ICLR 2022] Code for paper "Exploring Extreme Parameter Compression for Pre-trained Language Models"(https://arxiv.org/abs/2205.10036)☆22Updated 2 years ago
- Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆152Updated 9 months ago