pmichel31415 / are-16-heads-really-better-than-1View external linksLinks
Code for the paper "Are Sixteen Heads Really Better than One?"
☆175Apr 1, 2020Updated 5 years ago
Alternatives and similar repositories for are-16-heads-really-better-than-1
Users that are interested in are-16-heads-really-better-than-1 are comparing it to the libraries listed below
Sorting:
- This is a repository with the code for the ACL 2019 paper "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, t…☆319Aug 2, 2021Updated 4 years ago
- Method to improve inference time for BERT. This is an implementation of the paper titled "PoWER-BERT: Accelerating BERT Inference via Pro…☆62Sep 17, 2025Updated 5 months ago
- ☆471Apr 4, 2021Updated 4 years ago
- Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)☆102Nov 2, 2020Updated 5 years ago
- Transformer training code for sequential tasks☆610Sep 14, 2021Updated 4 years ago
- Code for EMNLP 2020 paper CoDIR☆41Oct 4, 2022Updated 3 years ago
- Code for the paper "Weight Poisoning Attacks on Pre-trained Models" (ACL 2020)☆143Sep 22, 2025Updated 4 months ago
- Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"☆127Apr 5, 2021Updated 4 years ago
- ⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).☆315Jun 12, 2023Updated 2 years ago
- ☆221Jun 8, 2020Updated 5 years ago
- Longformer: The Long-Document Transformer☆2,186Feb 8, 2023Updated 3 years ago
- SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in c…☆359Feb 22, 2022Updated 3 years ago
- Tool for Evaluating Adversarial Perturbations on Text☆61Feb 27, 2022Updated 3 years ago
- Resources for the MRQA 2019 Shared Task☆294Aug 5, 2021Updated 4 years ago
- Code for using and evaluating SpanBERT.☆903Jul 25, 2023Updated 2 years ago
- Implementation for paper " Unsupervised Domain Adaptation on Reading Comprehension "☆30May 21, 2020Updated 5 years ago
- [EMNLP'19] Summary for Transformer Understanding☆53Nov 26, 2019Updated 6 years ago
- DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference☆162Mar 25, 2022Updated 3 years ago
- PyTorch original implementation of Cross-lingual Language Model Pretraining.☆2,924Feb 14, 2023Updated 3 years ago
- A tool for holistic analysis of language generations systems☆471Sep 22, 2025Updated 4 months ago
- Supporting example for "A Rust SentencePiece implementation"☆20Jun 7, 2020Updated 5 years ago
- Codes for <Kernelized Bayesian Softmax for Text Generation> in NeurIPS 2019☆16Nov 20, 2019Updated 6 years ago
- Implementation of a Quantized Transformer Model☆19Mar 20, 2019Updated 6 years ago
- "Learning Discrete and Continuous Factors of Data via Alternating Disentanglement" accepted at ICML2019☆21Aug 22, 2019Updated 6 years ago
- The repository for the paper "When Do You Need Billions of Words of Pretraining Data?"☆21Nov 10, 2020Updated 5 years ago
- PyTorch code for meta seq2seq learning☆43Jan 14, 2020Updated 6 years ago
- ☆99Jul 7, 2020Updated 5 years ago
- ☆178Jul 31, 2020Updated 5 years ago
- pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference☆61Dec 8, 2022Updated 3 years ago
- Fast, general, and tested differentiable structured prediction in PyTorch☆1,123Apr 20, 2022Updated 3 years ago
- pytorch implementation for Patient Knowledge Distillation for BERT Model Compression☆203Sep 20, 2019Updated 6 years ago
- For the code release of our arXiv paper "Revisiting Few-sample BERT Fine-tuning" (https://arxiv.org/abs/2006.05987).☆185Jun 12, 2023Updated 2 years ago
- Python library for backtranslation (with Google Translate)☆12Jan 11, 2020Updated 6 years ago
- Cascaded Text Generation with Markov Transformers☆130Mar 20, 2023Updated 2 years ago
- Implementation of the NLI model in our ACL 2019 paper: Augmenting Neural Networks with First-order Logic.☆44Nov 3, 2020Updated 5 years ago
- Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index (DenSPI)☆200Jul 6, 2023Updated 2 years ago
- Code examples for CMU CS11-731, Machine Translation and Sequence-to-sequence Models☆35Nov 4, 2019Updated 6 years ago
- OOD Generalization and Detection (ACL 2020)☆59Apr 15, 2020Updated 5 years ago
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators☆2,371Mar 23, 2024Updated last year