Code for the paper "Are Sixteen Heads Really Better than One?"
☆175Apr 1, 2020Updated 5 years ago
Alternatives and similar repositories for are-16-heads-really-better-than-1
Users that are interested in are-16-heads-really-better-than-1 are comparing it to the libraries listed below
Sorting:
- This is a repository with the code for the ACL 2019 paper "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, t…☆319Aug 2, 2021Updated 4 years ago
- Transformer training code for sequential tasks☆609Sep 14, 2021Updated 4 years ago
- ☆472Apr 4, 2021Updated 4 years ago
- ⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).☆315Jun 12, 2023Updated 2 years ago
- Method to improve inference time for BERT. This is an implementation of the paper titled "PoWER-BERT: Accelerating BERT Inference via Pro…☆62Sep 17, 2025Updated 6 months ago
- Code for EMNLP 2020 paper CoDIR☆41Oct 4, 2022Updated 3 years ago
- Tool for Evaluating Adversarial Perturbations on Text☆61Feb 27, 2022Updated 4 years ago
- Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)☆102Nov 2, 2020Updated 5 years ago
- [EMNLP'19] Summary for Transformer Understanding☆53Nov 26, 2019Updated 6 years ago
- Implementation for paper " Unsupervised Domain Adaptation on Reading Comprehension "☆30May 21, 2020Updated 5 years ago
- Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"☆127Apr 5, 2021Updated 4 years ago
- Resources for the MRQA 2019 Shared Task☆294Aug 5, 2021Updated 4 years ago
- Longformer: The Long-Document Transformer☆2,189Feb 8, 2023Updated 3 years ago
- ☆17May 14, 2020Updated 5 years ago
- SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in c…☆359Feb 22, 2022Updated 4 years ago
- pytorch implementation for Patient Knowledge Distillation for BERT Model Compression☆204Sep 20, 2019Updated 6 years ago
- Code for the paper "Weight Poisoning Attacks on Pre-trained Models" (ACL 2020)☆143Sep 22, 2025Updated 5 months ago
- A tool for holistic analysis of language generations systems☆471Sep 22, 2025Updated 5 months ago
- Implementation of a Quantized Transformer Model☆19Mar 20, 2019Updated 7 years ago
- Supporting example for "A Rust SentencePiece implementation"☆20Jun 7, 2020Updated 5 years ago
- PyTorch original implementation of Cross-lingual Language Model Pretraining.☆2,927Feb 14, 2023Updated 3 years ago
- Codes for <Kernelized Bayesian Softmax for Text Generation> in NeurIPS 2019☆16Nov 20, 2019Updated 6 years ago
- ☆221Jun 8, 2020Updated 5 years ago
- Code for using and evaluating SpanBERT.☆906Jul 25, 2023Updated 2 years ago
- PyTorch code for meta seq2seq learning☆43Jan 14, 2020Updated 6 years ago
- ☆178Jul 31, 2020Updated 5 years ago
- ☆13Apr 16, 2021Updated 4 years ago
- Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.☆3,155Jan 22, 2024Updated 2 years ago
- Search-based-Neural-Structured-Learning-for-Sequential-Question-Answering☆33Jun 12, 2023Updated 2 years ago
- Successfully training approximations to full-rank matrices for efficiency in deep learning.☆17Jan 5, 2021Updated 5 years ago
- DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference☆162Mar 25, 2022Updated 3 years ago
- ☆99Jul 7, 2020Updated 5 years ago
- A toolkit for evaluating the linguistic knowledge and transferability of contextual representations. Code for "Linguistic Knowledge and T…☆210Oct 20, 2021Updated 4 years ago
- ☆16Aug 20, 2021Updated 4 years ago
- ☆28Nov 28, 2021Updated 4 years ago
- Fast, general, and tested differentiable structured prediction in PyTorch☆1,124Apr 20, 2022Updated 3 years ago
- Implementation of the NLI model in our ACL 2019 paper: Augmenting Neural Networks with First-order Logic.☆45Nov 3, 2020Updated 5 years ago
- Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index (DenSPI)☆200Jul 6, 2023Updated 2 years ago
- Cascaded Text Generation with Markov Transformers☆130Mar 20, 2023Updated 3 years ago