bzhangGo / rmsnormLinks

Root Mean Square Layer Normalization

☆249

Alternatives and similar repositories for rmsnorm

Users that are interested in rmsnorm are comparing it to the libraries listed below

Sorting:

facebookresearch / mega
Sequence modeling with Mega.
☆297Updated 2 years ago
fkodom / grouped-query-attention-pytorch
(Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from …
☆173Updated last year
ofirpress / attention_with_linear_biases
Code for the ALiBi method for transformer language models (ICLR 2022)
☆538Updated last year
princeton-nlp / CoFiPruning
[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
☆196Updated 2 years ago
lucidrains / flash-cosine-sim-attention
Implementation of fused cosine similarity attention in the same style as Flash Attention
☆214Updated 2 years ago
epfml / dynamic-sparse-flash-attention
☆147Updated 2 years ago
lucidrains / memory-efficient-attention-pytorch
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"
☆379Updated 2 years ago
lucidrains / FLASH-pytorch
Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"
☆368Updated last year
berlino / gated_linear_attention
☆106Updated last year
OpenNLPLab / TransnormerLLM
Official implementation of TransNormerLLM: A Faster and Better LLM
☆247Updated last year
lucidrains / triton-transformer
Implementation of a Transformer, but completely in Triton
☆273Updated 3 years ago
huggingface / nn_pruning
Prune a model while finetuning or training.
☆403Updated 3 years ago
bobby-he / simplified_transformers
☆292Updated 7 months ago
kssteven418 / LTP
[KDD'22] Learned Token Pruning for Transformers
☆98Updated 2 years ago
OpenNLPLab / lightning-attention
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
☆323Updated 5 months ago
bojone / rerope
Rectified Rotary Position Embeddings
☆378Updated last year
OpenNLPLab / Transnormer
[EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer
☆61Updated 2 years ago
lucidrains / CoLT5-attention
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
☆229Updated 11 months ago
lucidrains / linformer
Implementation of Linformer for Pytorch
☆295Updated last year
OpenNLPLab / cosFormer
[ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention
☆196Updated 2 years ago
kyegomez / AttentionIsOFFByOne
Implementation of "Attention Is Off By One" by Evan Miller
☆194Updated last year
lucidrains / st-moe-pytorch
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
☆353Updated last year
booydar / LM-RMT
Recurrent Memory Transformer
☆150Updated last year
thu-ml / low-bit-optimizers
Low-bit optimizers for PyTorch
☆130Updated last year
mgmalek / efficient_cross_entropy
☆114Updated last year
YuchuanTian / DiJiang
[ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…
☆102Updated last year
NVIDIA / transformer-ls
Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).
☆225Updated 3 years ago
lucidrains / block-recurrent-transformer-pytorch
Implementation of Block Recurrent Transformer - Pytorch
☆221Updated 11 months ago
haoliuhl / ringattention
Large Context Attention
☆720Updated 6 months ago
lucidrains / local-attention
An implementation of local windowed attention for language modeling
☆471Updated 3 weeks ago