jaketae / alibiLinks

PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

☆29

Alternatives and similar repositories for alibi

Users that are interested in alibi are comparing it to the libraries listed below

Sorting:

lucidrains / insertion-deletion-ddpm
Implementation of Insertion-deletion Denoising Diffusion Probabilistic Models
☆30Updated 3 years ago
lucidrains / fast-transformer-pytorch
Implementation of Fast Transformer in Pytorch
☆177Updated 4 years ago
microsoft / ResiDual
ResiDual: Transformer with Dual Residual Connections, https://arxiv.org/abs/2304.14802
☆95Updated 2 years ago
lucidrains / multistream-transformers
Implementation of Multistream Transformers in Pytorch
☆54Updated 4 years ago
lucidrains / hourglass-transformer-pytorch
Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI
☆94Updated 3 years ago
lucidrains / axial-positional-embedding
Axial Positional Embedding for Pytorch
☆83Updated 7 months ago
maum-ai / pnlp-mixer
Unofficial PyTorch Implementation for pNLP-Mixer: an Efficient all-MLP Architecture for Language (https://arxiv.org/abs/2202.04350)
☆63Updated 3 years ago
lucidrains / mixture-of-attention
Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
☆122Updated 11 months ago
lucidrains / rela-transformer
Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012
☆49Updated 3 years ago
lucidrains / RQ-Transformer
Implementation of RQ Transformer, proposed in the paper "Autoregressive Image Generation using Residual Quantization"
☆120Updated 3 years ago
ChristophReich1996 / HyperMixer
PyTorch reimplementation of the paper "HyperMixer: An MLP-based Green AI Alternative to Transformers" [arXiv 2022].
☆17Updated 3 years ago
lucidrains / compositional-attention-pytorch
Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process wi…
☆51Updated 3 years ago
lucidrains / token-shift-gpt
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing
☆50Updated 3 years ago
lucidrains / rvq-vae-gpt
My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation
☆89Updated 11 months ago
patil-suraj / simple-diffusion
An implementation of simple diffusion in PyTorch (and JAX)
☆35Updated 2 years ago
lucidrains / gradnorm-pytorch
A practical implementation of GradNorm, Gradient Normalization for Adaptive Loss Balancing, in Pytorch
☆110Updated last month
lucidrains / light-recurrent-unit-pytorch
Implementation of a Light Recurrent Unit in Pytorch
☆49Updated last year
lucidrains / memory-compressed-attention
Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"
☆69Updated 2 years ago
lucidrains / kalman-filtering-attention
Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"
☆59Updated last year
jiaweizzhao / ZerO-initialization
☆75Updated 2 years ago
lucidrains / NWT-pytorch
Implementation of NWT, audio-to-video generation, in Pytorch
☆91Updated 3 years ago
lucidrains / discrete-key-value-bottleneck-pytorch
Implementation of Discrete Key / Value Bottleneck, in Pytorch
☆88Updated 2 years ago
lucidrains / n-grammer-pytorch
Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch
☆76Updated 2 years ago
lucidrains / coordinate-descent-attention
Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk
☆46Updated 2 years ago
lucidrains / deep-linear-network
A simple implementation of a deep linear Pytorch module
☆21Updated 4 years ago
lucidrains / Mega-pytorch
Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
☆207Updated 2 years ago
lucidrains / autoregressive-linear-attention-cuda
CUDA implementation of autoregressive linear attention, with all the latest research findings
☆44Updated 2 years ago
lucidrains / memory-transformer-xl
A variant of Transformer-XL where the memory is updated not with a queue, but with attention
☆49Updated 5 years ago
lsj2408 / URPE
[NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)
☆32Updated 2 years ago
lucidrains / tableformer-pytorch
Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch
☆39Updated 3 years ago