guolinke / fused_ops
☆10Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for fused_ops
- ☆20Updated 3 years ago
- ☆36Updated last year
- A PyTorch implementation of Transformer, experimenting with both Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).☆2Updated 4 years ago
- Python pdb for multiple processes☆32Updated 2 years ago
- Source code for "Efficient Training of BERT by Progressively Stacking"☆112Updated 5 years ago
- Unofficially Implements https://arxiv.org/abs/2112.05682 to get Linear Memory Cost on Attention for PyTorch☆12Updated 2 years ago
- Fast and memory-efficient exact attention☆27Updated 2 weeks ago
- [ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845☆119Updated 3 years ago
- A pytorch realization of adafactor (https://arxiv.org/pdf/1804.04235.pdf )☆24Updated 5 years ago
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Updated 2 years ago
- ☆29Updated last month
- Code of the NVIDIA winning solution to the 2nd OGB-LSC at the NeurIPS 2022 challenge with dataset PCQM4Mv2☆17Updated 2 years ago
- ☆23Updated 2 years ago
- ☆29Updated 5 months ago
- BANG is a new pretraining model to Bridge the gap between Autoregressive (AR) and Non-autoregressive (NAR) Generation. AR and NAR generat…☆28Updated 2 years ago
- Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"☆89Updated 3 years ago
- The implementation of multi-branch attentive Transformer (MAT).☆33Updated 4 years ago
- Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"☆44Updated 2 years ago
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆33Updated last year
- (ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.☆21Updated 2 years ago
- Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or columns of a 2d feature map, …☆29Updated 3 years ago
- Code for EMNLP 2020 paper CoDIR☆41Updated 2 years ago
- Torch Distributed Experimental☆116Updated 3 months ago
- Source code for "Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation"☆19Updated 5 years ago
- Transformation library for LightGBM☆33Updated last year
- lanmt ebm☆11Updated 4 years ago