lucidrains / product-key-memoryView external linksLinks
Standalone Product Key Memory module in Pytorch - for augmenting Transformer models
☆87Nov 1, 2025Updated 3 months ago
Alternatives and similar repositories for product-key-memory
Users that are interested in product-key-memory are comparing it to the libraries listed below
Sorting:
- Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk☆47Jul 16, 2023Updated 2 years ago
- Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT☆224Aug 20, 2024Updated last year
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Oct 22, 2023Updated 2 years ago
- Explorations into the recently proposed Taylor Series Linear Attention☆100Aug 18, 2024Updated last year
- Implementation of Recurrent Interface Network (RIN), for highly efficient generation of images and video without cascading networks, in P…☆207Feb 14, 2024Updated 2 years ago
- Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or columns of a 2d feature map, …☆39Aug 3, 2021Updated 4 years ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆46May 23, 2023Updated 2 years ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆123Oct 17, 2024Updated last year
- A Transformer made of Rotation-equivariant Attention using Vector Neurons☆101Aug 1, 2023Updated 2 years ago
- Implementation of a holodeck, written in Pytorch☆18Nov 1, 2023Updated 2 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- Implementation of the GLOM model for text☆11Mar 4, 2021Updated 4 years ago
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"☆59Oct 22, 2023Updated 2 years ago
- ☆16Dec 19, 2024Updated last year
- Implementation of the Llama architecture with RLHF + Q-learning☆170Feb 1, 2025Updated last year
- Axial Positional Embedding for Pytorch☆84Feb 25, 2025Updated 11 months ago
- Implementation of Discrete Key / Value Bottleneck, in Pytorch☆88Jul 9, 2023Updated 2 years ago
- An attempt to merge ESBN with Transformers, to endow Transformers with the ability to emergently bind symbols☆16Aug 3, 2021Updated 4 years ago
- Multidimensional indexing for tensors☆137Jul 17, 2023Updated 2 years ago
- Latent Diffusion Language Models☆70Sep 20, 2023Updated 2 years ago
- An implementation of (Induced) Set Attention Block, from the Set Transformers paper☆67Jan 10, 2023Updated 3 years ago
- Pytorch implementation of Compressive Transformers, from Deepmind☆163Oct 4, 2021Updated 4 years ago
- To be a next-generation DL-based phenotype prediction from genome mutations.☆19May 17, 2021Updated 4 years ago
- Implementation of Nvidia's NeuralPlexer, for end-to-end differentiable design of functional small-molecules and ligand-binding proteins, …☆52Nov 20, 2023Updated 2 years ago
- ☆12Sep 13, 2025Updated 5 months ago
- ☆23Jun 25, 2021Updated 4 years ago
- PyTorch Implementation of NeurIPS 2020 paper "Learning Sparse Prototypes for Text Generation"☆22Jul 8, 2021Updated 4 years ago
- My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation☆90Oct 11, 2024Updated last year
- ☆21Mar 15, 2023Updated 2 years ago
- GPT, but made only out of MLPs☆89May 25, 2021Updated 4 years ago
- Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention☆270Aug 10, 2021Updated 4 years ago
- ☆10Jun 8, 2024Updated last year
- ☆62Dec 8, 2023Updated 2 years ago
- How to use tensorboard in fastai☆21Jul 10, 2019Updated 6 years ago
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆220Feb 13, 2023Updated 3 years ago
- Based on https://github.com/fatchord/WaveRNN☆24May 3, 2020Updated 5 years ago
- Implementation of Block Recurrent Transformer - Pytorch☆224Aug 20, 2024Updated last year
- Implementation of Hyena Hierarchy in JAX☆10Apr 30, 2023Updated 2 years ago
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification☆11Aug 12, 2023Updated 2 years ago