guolinke / fused_ops
☆10Updated 2 years ago
Related projects: ⓘ
- ☆20Updated 3 years ago
- ☆14Updated this week
- ☆16Updated last week
- Unofficially Implements https://arxiv.org/abs/2112.05682 to get Linear Memory Cost on Attention for PyTorch☆12Updated 2 years ago
- Implementation of Lie Transformer, Equivariant Self-Attention, in Pytorch☆87Updated 3 years ago
- A PyTorch implementation of Transformer, experimenting with both Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).☆2Updated 4 years ago
- Python pdb for multiple processes☆30Updated last year
- lanmt ebm☆11Updated 4 years ago
- ☆15Updated this week
- PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes☆16Updated 3 weeks ago
- Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch☆45Updated 3 years ago
- Using FlexAttention to compute attention with different masking patterns☆28Updated last week
- ☆35Updated 10 months ago
- (ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.☆21Updated 2 years ago
- The implementation of multi-branch attentive Transformer (MAT).☆33Updated 4 years ago
- Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"☆89Updated 3 years ago
- ☆20Updated this week
- Code for SegTree Transformer (ICLR-RLGM 2019).☆27Updated 4 years ago
- Unofficial PyTorch implementation of "Step-unrolled Denoising Autoencoders for Text Generation"☆22Updated last year
- ☆11Updated 3 years ago
- A python library for highly configurable transformers - easing model architecture search and experimentation.☆50Updated 2 years ago
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Updated 2 years ago
- Odysseus: Playground of LLM Sequence Parallelism☆50Updated 3 months ago
- ☆29Updated last year
- Implementation of Tranception, an attention network, paired with retrieval, that is SOTA for protein fitness prediction☆31Updated 2 years ago
- The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Natu…☆48Updated 3 years ago
- Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)☆58Updated 2 years ago
- ☆16Updated this week
- Code of the NVIDIA winning solution to the 2nd OGB-LSC at the NeurIPS 2022 challenge with dataset PCQM4Mv2☆17Updated last year
- PyTorch Examples repo for "ReZero is All You Need: Fast Convergence at Large Depth"☆62Updated last month