pmichel31415/are-16-heads-really-better-than-1

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/pmichel31415/are-16-heads-really-better-than-1)

pmichel31415 / are-16-heads-really-better-than-1

Code for the paper "Are Sixteen Heads Really Better than One?"

☆175

Alternatives and similar repositories for are-16-heads-really-better-than-1

Users that are interested in are-16-heads-really-better-than-1 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lena-voita / the-story-of-heads
View on GitHub
This is a repository with the code for the ACL 2019 paper "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, t…
☆324Aug 2, 2021Updated 4 years ago
facebookresearch / adaptive-span
View on GitHub
Transformer training code for sequential tasks
☆610Sep 14, 2021Updated 4 years ago
clarkkev / attention-analysis
View on GitHub
☆474Apr 4, 2021Updated 5 years ago
JetRunner / BERT-of-Theseus
View on GitHub
⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).
☆316Jun 12, 2023Updated 3 years ago
intersun / CoDIR
View on GitHub
Code for EMNLP 2020 paper CoDIR
☆41Oct 4, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
IBM / PoWER-BERT
View on GitHub
Method to improve inference time for BERT. This is an implementation of the paper titled "PoWER-BERT: Accelerating BERT Inference via Pro…
☆63Sep 17, 2025Updated 10 months ago
pmichel31415 / teapot-nlp
View on GitHub
Tool for Evaluating Adversarial Perturbations on Text
☆61Feb 27, 2022Updated 4 years ago
clovaai / length-adaptive-transformer
View on GitHub
Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)
☆102Nov 2, 2020Updated 5 years ago
yaohungt / TransformerDissection
View on GitHub
[EMNLP'19] Summary for Transformer Understanding
☆53Nov 26, 2019Updated 6 years ago
caoyu-noob / CASe
View on GitHub
Implementation for paper " Unsupervised Domain Adaptation on Reading Comprehension "
☆30May 21, 2020Updated 6 years ago
yzh119 / BPT
View on GitHub
Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"
☆127Apr 5, 2021Updated 5 years ago
mrqa / MRQA-Shared-Task-2019
View on GitHub
Resources for the MRQA 2019 Shared Task
☆294Aug 5, 2021Updated 4 years ago
mitchellgordon95 / bert-prune
View on GitHub
☆17May 14, 2020Updated 6 years ago
facebookresearch / SentAugment
View on GitHub
SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in c…
☆359Feb 22, 2022Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
allenai / longformer
View on GitHub
Longformer: The Long-Document Transformer
☆2,201Feb 8, 2023Updated 3 years ago
intersun / PKD-for-BERT-Model-Compression
View on GitHub
pytorch implementation for Patient Knowledge Distillation for BERT Model Compression
☆203Sep 20, 2019Updated 6 years ago
neulab / RIPPLe
View on GitHub
Code for the paper "Weight Poisoning Attacks on Pre-trained Models" (ACL 2020)
☆142Sep 22, 2025Updated 9 months ago
neulab / compare-mt
View on GitHub
A tool for holistic analysis of language generations systems
☆471Sep 22, 2025Updated 9 months ago
Andrew-Tierno / QuantizedTransformer
View on GitHub
Implementation of a Quantized Transformer Model
☆20Mar 20, 2019Updated 7 years ago
guillaume-be / SentencePiece-Rust-example
View on GitHub
Supporting example for "A Rust SentencePiece implementation"
☆20Jun 7, 2020Updated 6 years ago
NingMiao / KerBS
View on GitHub
Codes for <Kernelized Bayesian Softmax for Text Generation> in NeurIPS 2019
☆16Nov 20, 2019Updated 6 years ago
facebookresearch / XLM
View on GitHub
PyTorch original implementation of Cross-lingual Language Model Pretraining.
☆2,925Feb 14, 2023Updated 3 years ago
laiguokun / Funnel-Transformer
View on GitHub
☆220Jun 8, 2020Updated 6 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
facebookresearch / SpanBERT
View on GitHub
Code for using and evaluating SpanBERT.
☆908Jul 25, 2023Updated 2 years ago
brendenlake / meta_seq2seq
View on GitHub
PyTorch code for meta seq2seq learning
☆44Jan 14, 2020Updated 6 years ago
harvardnlp / urnng
View on GitHub
☆179Jul 31, 2020Updated 5 years ago
pdufter / staticlama
View on GitHub
☆13Apr 16, 2021Updated 5 years ago
microsoft / DynSP
View on GitHub
Search-based-Neural-Structured-Learning-for-Sequential-Question-Answering
☆33Jun 12, 2023Updated 3 years ago
huawei-noah / Pretrained-Language-Model
View on GitHub
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
☆3,162Jan 22, 2024Updated 2 years ago
BayesWatch / deficient-efficient
View on GitHub
Successfully training approximations to full-rank matrices for efficiency in deep learning.
☆16Jan 5, 2021Updated 5 years ago
rycolab / differentiable-subset-pruning
View on GitHub
☆17Aug 20, 2021Updated 4 years ago
castorini / DeeBERT
View on GitHub
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
☆161Mar 25, 2022Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
MC-BERT / MC-BERT
View on GitHub
☆99Jul 7, 2020Updated 6 years ago
nelson-liu / contextual-repr-analysis
View on GitHub
A toolkit for evaluating the linguistic knowledge and transferability of contextual representations. Code for "Linguistic Knowledge and T…
☆212Oct 20, 2021Updated 4 years ago
utahnlp / layer_augmentation
View on GitHub
Implementation of the NLI model in our ACL 2019 paper: Augmenting Neural Networks with First-order Logic.
☆46Nov 3, 2020Updated 5 years ago
harvardnlp / pytorch-struct
View on GitHub
Fast, general, and tested differentiable structured prediction in PyTorch
☆1,132Apr 20, 2022Updated 4 years ago
uralik / beamdream
View on GitHub
☆28Nov 28, 2021Updated 4 years ago
seominjoon / denspi
View on GitHub
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index (DenSPI)
☆200Jul 6, 2023Updated 3 years ago
harvardnlp / cascaded-generation
View on GitHub
Cascaded Text Generation with Markov Transformers
☆130Mar 20, 2023Updated 3 years ago