google-research / long-range-arenaLinks
Long Range Arena for Benchmarking Efficient Transformers
☆757Updated last year
Alternatives and similar repositories for long-range-arena
Users that are interested in long-range-arena are comparing it to the libraries listed below
Sorting:
- Transformer based on a variant of attention that is linear complexity in respect to sequence length☆768Updated last year
- Pytorch library for fast transformer implementations☆1,710Updated 2 years ago
- Sequence modeling with Mega.☆295Updated 2 years ago
- An implementation of Performer, a linear attention-based transformer, in Pytorch☆1,132Updated 3 years ago
- Implementation of https://srush.github.io/annotated-s4☆495Updated 2 years ago
- Fast Block Sparse Matrices for Pytorch☆545Updated 4 years ago
- ☆352Updated last year
- Reformer, the efficient Transformer, in Pytorch☆2,169Updated last year
- ☆376Updated last year
- Understanding the Difficulty of Training Transformers☆329Updated 3 years ago
- Fully featured implementation of Routing Transformer☆292Updated 3 years ago
- The entmax mapping and its loss, a family of sparse softmax alternatives.☆437Updated 11 months ago
- Code for the ALiBi method for transformer language models (ICLR 2022)☆530Updated last year
- A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models☆758Updated last year
- maximal update parametrization (µP)☆1,526Updated 10 months ago
- Fast, general, and tested differentiable structured prediction in PyTorch☆1,112Updated 3 years ago
- Transformers for Longer Sequences☆612Updated 2 years ago
- My take on a practical implementation of Linformer for Pytorch.☆415Updated 2 years ago
- Implementation of https://arxiv.org/abs/1904.00962☆374Updated 4 years ago
- Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch☆681Updated 6 months ago
- Library for 8-bit optimizers and quantization routines.☆716Updated 2 years ago
- Flexible components pairing 🤗 Transformers with Pytorch Lightning☆609Updated 2 years ago
- Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention☆263Updated 3 years ago
- Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch☆865Updated last year
- FastFormers - highly efficient transformer models for NLU☆705Updated 2 months ago
- Implementation of Linformer for Pytorch☆285Updated last year
- Tutel MoE: Optimized Mixture-of-Experts Library, Support DeepSeek FP8/FP4☆824Updated this week
- Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).☆225Updated 3 years ago
- An implementation of local windowed attention for language modeling☆450Updated 4 months ago
- ☆178Updated last year