google-research / long-range-arenaLinks

Long Range Arena for Benchmarking Efficient Transformers

☆767

Alternatives and similar repositories for long-range-arena

Users that are interested in long-range-arena are comparing it to the libraries listed below

Sorting:

lucidrains / linear-attention-transformer
Transformer based on a variant of attention that is linear complexity in respect to sequence length
☆811Updated last year
idiap / fast-transformers
Pytorch library for fast transformer implementations
☆1,749Updated 2 years ago
ofirpress / attention_with_linear_biases
Code for the ALiBi method for transformer language models (ICLR 2022)
☆543Updated 2 years ago
facebookresearch / mega
Sequence modeling with Mega.
☆301Updated 2 years ago
lucidrains / performer-pytorch
An implementation of Performer, a linear attention-based transformer, in Pytorch
☆1,156Updated 3 years ago
srush / annotated-s4
Implementation of https://srush.github.io/annotated-s4
☆504Updated 4 months ago
mlpen / Nystromformer
☆385Updated 2 years ago
huggingface / pytorch_block_sparse
Fast Block Sparse Matrices for Pytorch
☆547Updated 4 years ago
google-research / bigbird
Transformers for Longer Sequences
☆622Updated 3 years ago
microsoft / mup
maximal update parametrization (µP)
☆1,621Updated last year
google / flaxformer
☆363Updated last year
LiyuanLucasLiu / Transformer-Clinic
Understanding the Difficulty of Training Transformers
☆332Updated 3 years ago
google / seqio
Task-based datasets, preprocessing, and evaluation for sequence models.
☆587Updated 2 weeks ago
lucidrains / routing-transformer
Fully featured implementation of Routing Transformer
☆296Updated 4 years ago
tatp22 / linformer-pytorch
My take on a practical implementation of Linformer for Pytorch.
☆421Updated 3 years ago
lucidrains / mixture-of-experts
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
☆825Updated 2 years ago
deep-spin / entmax
The entmax mapping and its loss, a family of sparse softmax alternatives.
☆451Updated last year
lucidrains / local-attention
An implementation of local windowed attention for language modeling
☆484Updated 4 months ago
lucidrains / RETRO-pytorch
Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch
☆875Updated 2 years ago
lucidrains / linformer
Implementation of Linformer for Pytorch
☆301Updated last year
lucidrains / reformer-pytorch
Reformer, the efficient Transformer, in Pytorch
☆2,184Updated 2 years ago
facebookresearch / bitsandbytes
Library for 8-bit optimizers and quantization routines.
☆780Updated 3 years ago
lucidrains / sinkhorn-transformer
Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention
☆268Updated 4 years ago
cybertronai / pytorch-lamb
Implementation of https://arxiv.org/abs/1904.00962
☆377Updated 4 years ago
lucidrains / memory-efficient-attention-pytorch
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"
☆383Updated 2 years ago
lucidrains / mlm-pytorch
An implementation of masked language modeling for Pytorch, made as concise and simple as possible
☆179Updated 2 years ago
lucidrains / rotary-embedding-torch
Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch
☆779Updated 3 months ago
Lightning-Universe / lightning-transformers
Flexible components pairing 🤗 Transformers with Pytorch Lightning
☆612Updated 2 years ago
karpathy / deep-vector-quantization
VQVAEs, GumbelSoftmaxes and friends
☆595Updated 3 years ago
google-research / meliad
☆259Updated 5 months ago