AminRezaei0x443 / memory-efficient-attention
Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch
☆180Updated 2 years ago
Alternatives and similar repositories for memory-efficient-attention:
Users that are interested in memory-efficient-attention are comparing it to the libraries listed below
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆210Updated 2 years ago
- Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"☆370Updated last year
- Implementation of the Adan (ADAptive Nesterov momentum algorithm) Optimizer in Pytorch☆251Updated 2 years ago
- Implementation of Flash Attention in Jax☆204Updated 11 months ago
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆204Updated last year
- DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight …☆235Updated last year
- Implementation of a Transformer, but completely in Triton☆257Updated 2 years ago
- Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)☆186Updated 2 years ago
- Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch☆225Updated 5 months ago
- Sequence modeling with Mega.☆298Updated 2 years ago
- Simple and efficient RevNet-Library for PyTorch with XLA and DeepSpeed support and parameter offload☆126Updated 2 years ago
- ☆338Updated 10 months ago
- Named tensors with first-class dimensions for PyTorch☆321Updated last year
- A library to inspect and extract intermediate layers of PyTorch models.☆471Updated 2 years ago
- Implementation of Fast Transformer in Pytorch☆172Updated 3 years ago
- ☆199Updated 2 years ago
- Library for 8-bit optimizers and quantization routines.☆717Updated 2 years ago
- Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).☆225Updated 2 years ago
- Easy-to-use AdaHessian optimizer (PyTorch)☆77Updated 4 years ago
- Contrastive Language-Image Pretraining☆142Updated 2 years ago
- ☆372Updated last year
- Collection of the latest, greatest, deep learning optimizers (for Pytorch) - CNN, NLP suitable☆211Updated 3 years ago
- Implementation of Feedback Transformer in Pytorch☆105Updated 3 years ago
- Implementation of Nyström Self-attention, from the paper Nyströmformer☆127Updated last year
- Code for the ALiBi method for transformer language models (ICLR 2022)☆515Updated last year
- Block-sparse primitives for PyTorch☆153Updated 3 years ago
- Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)☆116Updated 3 years ago
- Slicing a PyTorch Tensor Into Parallel Shards☆298Updated 3 years ago
- Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.☆102Updated 3 years ago
- Amos optimizer with JEstimator lib.☆81Updated 9 months ago