lucidrains / lookahead-keys-attentionLinks
Causal Attention with Lookahead Keys
☆26Updated 2 months ago
Alternatives and similar repositories for lookahead-keys-attention
Users that are interested in lookahead-keys-attention are comparing it to the libraries listed below
Sorting:
- Implementation of Agent Attention in Pytorch☆92Updated last year
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆57Updated last year
- Implementation of Infini-Transformer in Pytorch☆113Updated 11 months ago
- Code for GFlowNet-EM, a novel algorithm for fitting latent variable models with compositional latents and an intractable true posterior.☆41Updated last year
- State Space Models☆71Updated last year
- Implementation of GateLoop Transformer in Pytorch and Jax☆91Updated last year
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"☆59Updated 2 years ago
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆166Updated 10 months ago
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆127Updated 2 years ago
- PyTorch implementation of Structured State Space for Sequence Modeling (S4), based on Annotated S4.☆87Updated last year
- Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)☆15Updated 11 months ago
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆34Updated 2 years ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆79Updated 2 years ago
- ☆16Updated 2 years ago
- Explorations into the recently proposed Taylor Series Linear Attention☆100Updated last year
- A repository for DenseSSMs☆89Updated last year
- A practical implementation of GradNorm, Gradient Normalization for Adaptive Loss Balancing, in Pytorch☆120Updated 3 months ago
- C-Mixup for NeurIPS 2022☆73Updated last year
- Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…☆47Updated 3 months ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆120Updated last year
- ☆33Updated last year
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆56Updated last month
- Implementation of Zorro, Masked Multimodal Transformer, in Pytorch☆97Updated 2 years ago
- Bayesian Attention Modules☆35Updated 4 years ago
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated 2 years ago
- An annotated implementation of the Hyena Hierarchy paper☆34Updated 2 years ago
- The official Pytorch implementation of the paper "Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT …☆39Updated last year
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆101Updated 2 years ago
- Official source code for "Graph Neural Networks for Learning Equivariant Representations of Neural Networks". In ICLR 2024 (oral).☆84Updated last year
- Pytorch implementation of a simple way to enable (Stochastic) Frame Averaging for any network☆51Updated last year