shaoyiHusky / SparseProgressiveDistillationLinks
☆12Updated 2 years ago
Alternatives and similar repositories for SparseProgressiveDistillation
Users that are interested in SparseProgressiveDistillation are comparing it to the libraries listed below
Sorting:
- [NeurIPS'23] Speculative Decoding with Big Little Decoder☆96Updated 2 years ago
- This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).☆113Updated 3 years ago
- [NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers☆192Updated 2 years ago
- This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).☆46Updated 3 years ago
- ☆143Updated last year
- Code for ACL2022 publication Transkimmer: Transformer Learns to Layer-wise Skim☆22Updated 3 years ago
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888☆37Updated last year
- Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)☆46Updated 2 years ago
- Repository for "Propagating Knowledge Updates to LMs Through Distillation" (NeurIPS 2023).☆26Updated last year
- [ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408☆198Updated 2 years ago
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆65Updated last year
- ☆17Updated 5 years ago
- [KDD'22] Learned Token Pruning for Transformers☆102Updated 2 years ago
- Efficient LLM Inference Acceleration using Prompting☆51Updated last year
- ☆63Updated 2 years ago
- First Latency-Aware Competitive LLM Agent Benchmark☆26Updated 8 months ago
- [NAACL 24 Oral] LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models☆39Updated last year
- [ICLR 2022] Code for paper "Exploring Extreme Parameter Compression for Pre-trained Language Models"(https://arxiv.org/abs/2205.10036)☆22Updated 2 years ago
- A curated list of Early Exiting papers, benchmarks, and misc.☆120Updated 2 years ago
- Block Sparse movement pruning☆83Updated 5 years ago
- Train large COMET (T5-3B/GPT2-XL) with small memory (on 11GB memory GPUs like 1080/2080) using DeepSpeed.☆14Updated 4 years ago
- ☆39Updated last year
- Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its…☆21Updated last year
- This package implements THOR: Transformer with Stochastic Experts.☆65Updated 4 years ago
- Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆152Updated 10 months ago
- ThinK: Thinner Key Cache by Query-Driven Pruning☆27Updated 11 months ago
- ICLR 2021☆48Updated 4 years ago
- Finetuning LLaMA with DeepSpeed☆10Updated 2 years ago
- ☆21Updated 2 years ago
- An innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification.☆26Updated 9 months ago