MAC-AutoML / YOCO-BERT
The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.
☆48Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for YOCO-BERT
- This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron …☆32Updated last year
- Block Sparse movement pruning☆78Updated 3 years ago
- [ACL-IJCNLP 2021] "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets" by Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, …☆17Updated 2 years ago
- Code for paper "Continual and Multi-Task Architecture Search (ACL 2019)"☆41Updated 5 years ago
- ☆56Updated 3 years ago
- Parameter Efficient Transfer Learning with Diff Pruning☆72Updated 3 years ago
- Code for SelfAugment☆27Updated 3 years ago
- The implementation of multi-branch attentive Transformer (MAT).☆33Updated 4 years ago
- MixPath: A Unified Approach for One-shot Neural Architecture Search☆28Updated 4 years ago
- Codes for DATA: Differentiable ArchiTecture Approximation.☆11Updated 3 years ago
- A supplementary code for Editable Neural Networks, an ICLR 2020 submission.☆46Updated 4 years ago
- [ICLR 2021] "UMEC: Unified Model and Embedding Compression for Efficient Recommendation Systems" by Jiayi Shen, Haotao Wang*, Shupeng Gui…☆39Updated 2 years ago
- ☆51Updated 3 years ago
- [ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Che…☆81Updated 2 years ago
- Role-Wise Data Augmentation for Knowledge Distillation☆18Updated 2 years ago
- Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"☆44Updated 2 years ago
- Code for BlockSwap (ICLR 2020).☆33Updated 3 years ago
- Zero-Shot Knowledge Distillation in Deep Networks in ICML2019☆49Updated 5 years ago
- Zero-Shot Knowledge Distillation in Deep Networks☆64Updated 2 years ago
- Revisiting Parameter Sharing for Automatic Neural Channel Number Search, NeurIPS 2020☆20Updated 4 years ago
- Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)☆59Updated 2 years ago
- [NeurIPS 2020] "The Lottery Ticket Hypothesis for Pre-trained BERT Networks", Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Ya…☆138Updated 2 years ago
- Code for paper 'Minimizing FLOPs to Learn Efficient Sparse Representations' published at ICLR 2020☆21Updated 4 years ago
- Improving generalization by controlling label-noise information in neural network weights.☆39Updated 4 years ago
- Code for ViTAS_Vision Transformer Architecture Search☆51Updated 3 years ago
- Implementation of the retriever distillation procedure as outlined in the paper "Distilling Knowledge from Reader to Retriever"☆32Updated 3 years ago
- Automated neural architecture search algorithms implemented in PyTorch and Autogluon toolkit.☆12Updated 4 years ago
- Knowledge Distillation Algorithms implemented with PyTorch☆17Updated 5 years ago
- Breaking the Curse of Space Explosion: Towards Efficient NAS with Curriculum Search☆17Updated 3 months ago