lucidrains / FLASH-pytorchLinks

Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"

☆368

Alternatives and similar repositories for FLASH-pytorch

Users that are interested in FLASH-pytorch are comparing it to the libraries listed below

Sorting:

OpenNLPLab / cosFormer
[ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention
☆196Updated 2 years ago
lucidrains / local-attention
An implementation of local windowed attention for language modeling
☆470Updated 2 weeks ago
lucidrains / linformer
Implementation of Linformer for Pytorch
☆294Updated last year
lucidrains / linear-attention-transformer
Transformer based on a variant of attention that is linear complexity in respect to sequence length
☆793Updated last year
wuch15 / Fastformer
A pytorch &keras implementation and demo of Fastformer.
☆189Updated 2 years ago
evelinehong / Transformer_Relative_Position_PyTorch
Implement the paper "Self-Attention with Relative Position Representations"
☆135Updated 4 years ago
ZhuiyiTechnology / roformer
Rotary Transformer
☆996Updated 3 years ago
NVIDIA / transformer-ls
Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).
☆225Updated 3 years ago
tatp22 / linformer-pytorch
My take on a practical implementation of Linformer for Pytorch.
☆417Updated 3 years ago
lucidrains / mixture-of-experts
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
☆788Updated last year
kyegomez / AttentionIsOFFByOne
Implementation of "Attention Is Off By One" by Evan Miller
☆193Updated last year
ofirpress / attention_with_linear_biases
Code for the ALiBi method for transformer language models (ICLR 2022)
☆538Updated last year
lucidrains / memory-efficient-attention-pytorch
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"
☆379Updated 2 years ago
bzhangGo / rmsnorm
Root Mean Square Layer Normalization
☆247Updated 2 years ago
transformer-vq / transformer_vq
☆196Updated last year
rishikksh20 / FNet-pytorch
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
☆259Updated 4 years ago
ZhuiyiTechnology / GAU-alpha
基于Gated Attention Unit的Transformer模型（尝鲜版）
☆97Updated 2 years ago
dropreg / R-Drop
☆881Updated last year
facebookresearch / mega
Sequence modeling with Mega.
☆296Updated 2 years ago
cmsflash / efficient-attention
An implementation of the efficient attention module.
☆320Updated 4 years ago
google-research / vmoe
☆656Updated last month
lucidrains / rotary-embedding-torch
Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch
☆719Updated last week
BIGBALLON / distribuuuu
The pure and clear PyTorch Distributed Training Framework.
☆275Updated last year
twistedcubic / attention-rank-collapse
[ICML 2021 Oral] We show pure attention suffers rank collapse, and how different mechanisms combat it.
☆165Updated 4 years ago
JunnYu / RoFormer_pytorch
RoFormer V1 & V2 pytorch
☆506Updated 3 years ago
fkodom / grouped-query-attention-pytorch
(Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from …
☆173Updated last year
lucidrains / st-moe-pytorch
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
☆352Updated last year
lucidrains / axial-attention
Implementation of Axial attention - attending to multi-dimensional data efficiently
☆384Updated 3 years ago
princeton-nlp / CoFiPruning
[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
☆196Updated 2 years ago
lucidrains / flash-cosine-sim-attention
Implementation of fused cosine similarity attention in the same style as Flash Attention
☆214Updated 2 years ago