Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)
☆40Aug 28, 2023Updated 2 years ago
Alternatives and similar repositories for task-aware-distillation
Users that are interested in task-aware-distillation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR2025 Spotlight] Advantage-Guided Distillation for Preference Alignment in Small Language Models☆26Feb 10, 2025Updated last year
- This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).☆46Oct 17, 2022Updated 3 years ago
- ☆22Feb 4, 2026Updated 4 months ago
- (CVPR 2024) Uniformity and Variance for Heterogeneous Federated Learning☆12Mar 6, 2024Updated 2 years ago
- No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models (ICLR 2022)☆29Feb 9, 2022Updated 4 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆39Jan 12, 2024Updated 2 years ago
- This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).☆114May 2, 2022Updated 4 years ago
- ☆12Oct 9, 2023Updated 2 years ago
- Code for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model☆13Feb 15, 2024Updated 2 years ago
- Train large COMET (T5-3B/GPT2-XL) with small memory (on 11GB memory GPUs like 1080/2080) using DeepSpeed.☆14Jan 23, 2022Updated 4 years ago
- Partially Non-Autoregressive Image Captioning☆10Sep 30, 2021Updated 4 years ago
- The code of the paper "Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation" (CVPR2023)☆40Mar 25, 2023Updated 3 years ago
- Parameter Efficient Transfer Learning with Diff Pruning☆74Feb 3, 2021Updated 5 years ago
- ☆16Mar 6, 2024Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)☆19Jul 28, 2021Updated 4 years ago
- [S&P'24] Test-Time Poisoning Attacks Against Test-Time Adaptation Models☆20Feb 18, 2025Updated last year
- Official Implementation for NorMuon paper☆81Apr 30, 2026Updated last month
- Python Implementation of 'Spectral Clustering in Heterogeneous Information Networks'.☆20Nov 13, 2019Updated 6 years ago
- ☆65Oct 17, 2023Updated 2 years ago
- AgentRE-Bench is an agentic benchmark that evaluates state-of-the-art models on long-horizon reverse engineering tasks, measuring their a…☆64May 14, 2026Updated last month
- Repo for the EMNLP'24 Paper "Dual-Space Knowledge Distillation for Large Language Models". A general white-box KD framework for both same…☆63Mar 21, 2026Updated 2 months ago
- Map4RDF allows visualising and interacting with Linked Geospatial Data available in any SPARQL endpoint☆10Feb 9, 2020Updated 6 years ago
- Explanation Optimization☆13Oct 16, 2020Updated 5 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Repository for "Propagating Knowledge Updates to LMs Through Distillation" (NeurIPS 2023).☆27Aug 25, 2024Updated last year
- ☆11Jul 6, 2023Updated 2 years ago
- PyTorch implementation of PTQ4DiT https://arxiv.org/abs/2405.16005☆47Nov 8, 2024Updated last year
- ☆22Oct 22, 2024Updated last year
- ☆13Dec 9, 2024Updated last year
- ☆23Nov 26, 2024Updated last year
- Code for Paper: Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data☆36Nov 16, 2020Updated 5 years ago
- Exploring and improving the quality of ChatGPT-generated code for LeetCode programming tasks.☆11Jan 19, 2024Updated 2 years ago
- ICML 2025 Oral: ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via α-β-Divergence☆45Aug 8, 2025Updated 10 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- ☆10Dec 28, 2018Updated 7 years ago
- Feature Structure Distillation with Centered Kernel Alignment in BERT Transferring official code☆11Jul 17, 2023Updated 2 years ago
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Apr 28, 2023Updated 3 years ago
- CCS 2023 | Explainable malware and vulnerability detection with XAI in paper "FINER: Enhancing State-of-the-art Classifiers with Feature …☆12Aug 20, 2024Updated last year
- Pytorch Implementation of "Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models", AAAI 2…☆38Feb 4, 2026Updated 4 months ago
- PyTorch Implementation for InMaP☆12Oct 28, 2023Updated 2 years ago
- [NeurIPS 2023] Code base for the Renyi Kernel Entropy (RKE) metric for generative models.☆13Jun 18, 2025Updated 11 months ago