lucidrains / block-recurrent-transformer-pytorchView external linksLinks
Implementation of Block Recurrent Transformer - Pytorch
☆224Aug 20, 2024Updated last year
Alternatives and similar repositories for block-recurrent-transformer-pytorch
Users that are interested in block-recurrent-transformer-pytorch are comparing it to the libraries listed below
Sorting:
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk☆47Jul 16, 2023Updated 2 years ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆123Oct 17, 2024Updated last year
- Implementation of Infini-Transformer in Pytorch☆112Jan 4, 2025Updated last year
- Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate …☆641Jul 17, 2023Updated 2 years ago
- Implementation of Recurrent Interface Network (RIN), for highly efficient generation of images and video without cascading networks, in P…☆207Feb 14, 2024Updated 2 years ago
- Recurrent Memory Transformer☆155Aug 14, 2023Updated 2 years ago
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"☆59Oct 22, 2023Updated 2 years ago
- Official Repository for Efficient Linear-Time Attention Transformers.☆18Jun 2, 2024Updated last year
- Convolutions for Sequence Modeling☆911Jun 13, 2024Updated last year
- [EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…☆14Oct 17, 2023Updated 2 years ago
- My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation☆90Oct 11, 2024Updated last year
- ☆259Jun 6, 2025Updated 8 months ago
- Pytorch implementation of Compressive Transformers, from Deepmind☆163Oct 4, 2021Updated 4 years ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Oct 22, 2023Updated 2 years ago
- Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT☆224Aug 20, 2024Updated last year
- Implementation of a U-net complete with efficient attention as well as the latest research findings☆292May 3, 2024Updated last year
- ☆16Mar 13, 2023Updated 2 years ago
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆135Oct 15, 2025Updated 4 months ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆67Apr 24, 2024Updated last year
- Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch☆804Jan 30, 2026Updated 2 weeks ago
- A Transformer made of Rotation-equivariant Attention using Vector Neurons☆101Aug 1, 2023Updated 2 years ago
- Official code for the paper "Attention as a Hypernetwork"☆47Jun 22, 2024Updated last year
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆49Jan 27, 2022Updated 4 years ago
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆207Aug 26, 2023Updated 2 years ago
- Standalone Product Key Memory module in Pytorch - for augmenting Transformer models☆87Nov 1, 2025Updated 3 months ago
- A repository for research on medium sized language models.☆77May 23, 2024Updated last year
- Implementation of Nvidia's NeuralPlexer, for end-to-end differentiable design of functional small-molecules and ligand-binding proteins, …☆52Nov 20, 2023Updated 2 years ago
- Implementation of Discrete Key / Value Bottleneck, in Pytorch☆88Jul 9, 2023Updated 2 years ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆46May 23, 2023Updated 2 years ago
- Implementation of the Llama architecture with RLHF + Q-learning☆170Feb 1, 2025Updated last year
- Implementation of the convolutional module from the Conformer paper, for use in Transformers☆433May 17, 2023Updated 2 years ago
- source code for NAACL2022 main conference "Dynamic Programming in Rank Space: Scaling Structured Inference with Low-Rank HMMs and PCFGs"☆10Sep 26, 2022Updated 3 years ago
- Advanced Formal Language Theory (263-5352-00L; Frühjahr 2023)☆10Feb 21, 2023Updated 2 years ago
- Implementation of Hyena Hierarchy in JAX☆10Apr 30, 2023Updated 2 years ago
- ☆11Oct 11, 2023Updated 2 years ago
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta☆13Nov 11, 2024Updated last year
- Implementation of a holodeck, written in Pytorch☆18Nov 1, 2023Updated 2 years ago
- Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch☆104Oct 10, 2023Updated 2 years ago