MAC-AutoML / YOCO-BERTLinks

The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

☆48

Alternatives and similar repositories for YOCO-BERT

Users that are interested in YOCO-BERT are comparing it to the libraries listed below

Sorting:

cheneydon / efficient-bert
This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron …
☆33Updated 2 years ago
sIncerass / powernorm
[ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845
☆120Updated 4 years ago
huggingface / block_movement_pruning
Block Sparse movement pruning
☆81Updated 4 years ago
bellymonster / Weighted-Soft-Label-Distillation
☆57Updated 4 years ago
VITA-Group / UMEC
[ICLR 2021] "UMEC: Unified Model and Embedding Compression for Efficient Recommendation Systems" by Jiayi Shen, Haotao Wang*, Shupeng Gui…
☆39Updated 3 years ago
sseung0703 / Zero-shot_Knowledge_Distillation
Zero-Shot Knowledge Distillation in Deep Networks in ICML2019
☆49Updated 6 years ago
leaderj1001 / Synthesizer-Rethinking-Self-Attention-Transformer-Models
Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch
☆70Updated 5 years ago
ankandrew / online-label-smoothing-pt
Implementation of Online Label Smoothing in PyTorch
☆94Updated 2 years ago
cjrd / selfaugment
Code for SelfAugment
☆27Updated 4 years ago
intersun / CoDIR
Code for EMNLP 2020 paper CoDIR
☆41Updated 2 years ago
VITA-Group / AsViT
[ICLR 2022] "As-ViT: Auto-scaling Vision Transformers without Training" by Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wa…
☆76Updated 3 years ago
XinbangZhang / DATA-NAS
Codes for DATA: Differentiable ArchiTecture Approximation.
☆11Updated 4 years ago
xtinkt / editable
A supplementary code for Editable Neural Networks, an ICLR 2020 submission.
☆46Updated 5 years ago
vcl-iisc / ZSKD
Zero-Shot Knowledge Distillation in Deep Networks
☆67Updated 3 years ago
microsoft / GEM
☆24Updated 4 years ago
dguo98 / DiffPruning
Parameter Efficient Transfer Learning with Diff Pruning
☆74Updated 4 years ago
vfdev-5 / UDA-pytorch
Unsupervised Data Augmentation experiments in PyTorch
☆60Updated 6 years ago
littleredxh / HardNegative
☆52Updated 4 years ago
VITA-Group / EarlyBERT
[ACL-IJCNLP 2021] "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets" by Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, …
☆18Updated 3 years ago
xiaomi-automl / MixPath
MixPath: A Unified Approach for One-shot Neural Architecture Search
☆29Updated 4 years ago
vrvlive / knowlege-distillation
PyTorch, PyTorch Lightning framework for trying knowledge distillation in image classification problems
☆32Updated last year
lucidrains / distilled-retriever-pytorch
Implementation of the retriever distillation procedure as outlined in the paper "Distilling Knowledge from Reader to Retriever"
☆32Updated 4 years ago
lottery-ticket / rewinding-iclr20-public
☆70Updated 5 years ago
tbachlechner / ReZero-examples
PyTorch Examples repo for "ReZero is All You Need: Fast Convergence at Large Depth"
☆61Updated last year
thunlp / TR-BERT
Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"
☆47Updated 3 years ago
bigaidream-projects / role-kd
Role-Wise Data Augmentation for Knowledge Distillation
☆19Updated 2 years ago
linzehui / Curriculum-Learning-PaperList-Materials
Curriculum Learning related papers and materials
☆54Updated 4 years ago
kssteven418 / LTP
[KDD'22] Learned Token Pruning for Transformers
☆98Updated 2 years ago
10-zin / Synthesizer
A PyTorch implementation of the paper - "Synthesizer: Rethinking Self-Attention in Transformer Models"
☆73Updated 2 years ago
pkuzengqi / Skyformer
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)
☆62Updated 3 years ago