Root Mean Square Layer Normalization
☆266Mar 28, 2023Updated 2 years ago
Alternatives and similar repositories for rmsnorm
Users that are interested in rmsnorm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆32May 25, 2024Updated last year
- ☆29Jul 9, 2024Updated last year
- Codebase accompanying the paper 'Widening the Representation Bottleneck in Neural Machine Translation with Lexical Shortcuts', (Emelin, D…☆11Feb 14, 2023Updated 3 years ago
- Introduction and scripts for ACL-2020 paper "On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation"☆21Jun 23, 2020Updated 5 years ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated 2 years ago
- Implementation of ICML 22 Paper: Scaling Structured Inference with Randomization☆13Jul 24, 2022Updated 3 years ago
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer☆64Jul 30, 2023Updated 2 years ago
- Small notebook to preprocess and evaluate images.☆14Nov 11, 2022Updated 3 years ago
- Transformers at any scale☆42Jan 18, 2024Updated 2 years ago
- Zero -- A neural machine translation system☆152May 8, 2023Updated 2 years ago
- [ACL‘20] Highway Transformer: A Gated Transformer.☆33Dec 5, 2021Updated 4 years ago
- The official Languini Kitchen repository☆14May 6, 2024Updated last year
- Make downloading scientific data much easier☆11Mar 3, 2026Updated 3 weeks ago
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆138Apr 30, 2024Updated last year
- Transformer related optimization, including BERT, GPT☆6,400Mar 27, 2024Updated last year
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- [EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…☆14Oct 17, 2023Updated 2 years ago
- ☆10Nov 15, 2020Updated 5 years ago
- Source code for "A Lightweight Recurrent Network for Sequence Modeling"☆26Dec 7, 2022Updated 3 years ago
- PyTorch extensions for high performance and large scale training.☆3,404Apr 26, 2025Updated 10 months ago
- Code for "Understanding and Improving Layer Normalization"☆46Dec 8, 2019Updated 6 years ago
- Foundation Architecture for (M)LLMs☆3,137Apr 11, 2024Updated last year
- Understanding the Difficulty of Training Transformers☆332May 31, 2022Updated 3 years ago
- ☆20Apr 17, 2023Updated 2 years ago
- Sequence-level 1F1B schedule for LLMs.☆19Jun 4, 2024Updated last year
- Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch☆804Jan 30, 2026Updated last month
- This project attempts to maintain the SOTA performance in machine translation☆108Sep 21, 2020Updated 5 years ago
- Neutron: A pytorch based implementation of Transformer and its variants.☆64Aug 10, 2023Updated 2 years ago
- Code for "Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation"☆13Jul 10, 2020Updated 5 years ago
- Codes for DATA: Differentiable ArchiTecture Approximation.☆11Jul 22, 2021Updated 4 years ago
- Python code for training models in the ACL paper, "Beyond BLEU:Training Neural Machine Translation with Semantic Similarity".☆52Dec 20, 2019Updated 6 years ago
- Fast and memory-efficient exact attention☆22,938Updated this week
- Code for the ALiBi method for transformer language models (ICLR 2022)☆554Oct 30, 2023Updated 2 years ago
- ☆32Sep 27, 2021Updated 4 years ago
- STABILIZING GRADIENTS FOR DEEP NEURAL NETWORKS VIA EFFICIENT SVD PARAMETERIZATION☆16Jun 5, 2018Updated 7 years ago
- Rotary Transformer☆1,090Mar 21, 2022Updated 4 years ago
- Cross Sentence Neural Machine Translation☆11Mar 26, 2018Updated 7 years ago
- This code repository presents the pytorch implementation of the paper “Implicit Deep Latent Variable Models for Text Generation”(EMNLP 20…☆55Mar 11, 2022Updated 4 years ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆78Mar 12, 2024Updated 2 years ago