Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)
☆40Aug 28, 2023Updated 2 years ago
Alternatives and similar repositories for task-aware-distillation
Users that are interested in task-aware-distillation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).☆46Oct 17, 2022Updated 3 years ago
- ☆22Feb 4, 2026Updated 2 months ago
- No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models (ICLR 2022)☆29Feb 9, 2022Updated 4 years ago
- ☆16Mar 5, 2025Updated last year
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆39Jan 12, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).☆114May 2, 2022Updated 3 years ago
- DiTAS: Quantizing Diffusion Transformers via Enhanced Activation Smoothing (WACV 2025)☆13Feb 7, 2026Updated 2 months ago
- Train large COMET (T5-3B/GPT2-XL) with small memory (on 11GB memory GPUs like 1080/2080) using DeepSpeed.☆14Jan 23, 2022Updated 4 years ago
- Partially Non-Autoregressive Image Captioning☆10Sep 30, 2021Updated 4 years ago
- The code of the paper "Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation" (CVPR2023)☆40Mar 25, 2023Updated 3 years ago
- Parameter Efficient Transfer Learning with Diff Pruning☆74Feb 3, 2021Updated 5 years ago
- ☆16Mar 6, 2024Updated 2 years ago
- ☆20Sep 5, 2024Updated last year
- [NeurIPS 2024] Search for Efficient LLMs☆16Jan 16, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)☆19Jul 28, 2021Updated 4 years ago
- [S&P'24] Test-Time Poisoning Attacks Against Test-Time Adaptation Models☆19Feb 18, 2025Updated last year
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆17Apr 2, 2025Updated last year
- Official repository for "LFR-GAN: Local Feature Refinement based Generative Adversarial Network for Text-to-Image Generation" (TOMM 2023)…☆11Mar 21, 2023Updated 3 years ago
- [EMNLP 2023] Lion: Adversarial Distillation of Proprietary Large Language Models☆211Feb 11, 2024Updated 2 years ago
- ☆64Oct 17, 2023Updated 2 years ago
- Codebase for the EMNLP 2021 paper "HittER: Hierarchical Transformers for Knowledge Graph Embeddings".☆12Nov 1, 2021Updated 4 years ago
- Chinese Biomedical Language Understanding Evaluation benchmark (ChineseBLUE)☆13Dec 23, 2019Updated 6 years ago
- This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicit…☆1,271Mar 9, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Repo for the EMNLP'24 Paper "Dual-Space Knowledge Distillation for Large Language Models". A general white-box KD framework for both same…☆63Mar 21, 2026Updated 3 weeks ago
- A collection for math word problem (MWP) works, including datasets, algorithms and so on.☆47Jun 18, 2024Updated last year
- Repository for "Propagating Knowledge Updates to LMs Through Distillation" (NeurIPS 2023).☆26Aug 25, 2024Updated last year
- First Latency-Aware Competitive LLM Agent Benchmark☆26Jun 3, 2025Updated 10 months ago
- ☆12Jul 6, 2023Updated 2 years ago
- [ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models☆75Nov 23, 2024Updated last year
- PyTorch implementation of PTQ4DiT https://arxiv.org/abs/2405.16005☆45Nov 8, 2024Updated last year
- ☆22Oct 22, 2024Updated last year
- CLIPCleaner: Cleaning Noisy Labels with CLIP (ACM MM2024)☆15Apr 28, 2025Updated 11 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Code for Paper: Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data☆36Nov 16, 2020Updated 5 years ago
- [ICML'25] "Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding" by Jiajun Zhu, Peihao Wang, Ruisi…☆15Jun 6, 2025Updated 10 months ago
- ☆10Dec 28, 2018Updated 7 years ago
- cifar10 dataset in vgg architecture by tensorflow☆10Feb 24, 2018Updated 8 years ago
- Feature Structure Distillation with Centered Kernel Alignment in BERT Transferring official code☆11Jul 17, 2023Updated 2 years ago
- Synthesizing Fingerprint from Pattern Type Analysis Features using cGAN - WITC 2019☆12Apr 19, 2019Updated 6 years ago
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Apr 28, 2023Updated 2 years ago