Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch
☆422Jan 6, 2025Updated last year
Alternatives and similar repositories for recurrent-memory-transformer-pytorch
Users that are interested in recurrent-memory-transformer-pytorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Recurrent Memory Transformer☆156Aug 14, 2023Updated 2 years ago
- Implementation of Block Recurrent Transformer - Pytorch☆224Aug 20, 2024Updated last year
- Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch☆231Sep 6, 2024Updated last year
- Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch☆655Dec 27, 2024Updated last year
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆207Aug 26, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Implementation of Memformer, a Memory-augmented Transformer, in Pytorch☆126Nov 13, 2020Updated 5 years ago
- Implementation of GateLoop Transformer in Pytorch and Jax☆92Jun 18, 2024Updated last year
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆220Feb 13, 2023Updated 3 years ago
- My own attempt at a long context genomics model, leveraging recent advances in long context attention modeling (Flash Attention + other h…☆54Jul 2, 2023Updated 2 years ago
- 🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch☆2,183Nov 27, 2024Updated last year
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- [NeurIPS 22] [AAAI 24] Recurrent Transformer-based long-context architecture.☆776Oct 25, 2024Updated last year
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…☆56Updated this week
- ☆68Aug 29, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- A concise but complete full-attention transformer with a set of promising experimental features from various papers☆5,806Updated this week
- Implementation of Recurrent Interface Network (RIN), for highly efficient generation of images and video without cascading networks, in P…☆207Feb 14, 2024Updated 2 years ago
- Another attempt at a long-context / efficient transformer by me☆38Apr 11, 2022Updated 3 years ago
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆102Feb 25, 2023Updated 3 years ago
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆379Jun 17, 2024Updated last year
- Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"☆389Jul 18, 2023Updated 2 years ago
- Convolutions for Sequence Modeling☆911Jun 13, 2024Updated last year
- Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"☆1,065Mar 7, 2024Updated 2 years ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Oct 22, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)☆55Mar 25, 2025Updated last year
- Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch☆879Oct 30, 2023Updated 2 years ago
- The Gaussian Histogram Loss (HL-Gauss) proposed by Imani et al. with a few convenient wrappers for regression, in Pytorch☆73Nov 18, 2025Updated 4 months ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆46May 23, 2023Updated 2 years ago
- Vector (and Scalar) Quantization, in Pytorch☆3,872Feb 12, 2026Updated last month
- Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate …☆641Jul 17, 2023Updated 2 years ago
- Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…☆21Mar 15, 2025Updated last year
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"☆59Oct 22, 2023Updated 2 years ago
- Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"☆182Jun 20, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆49Jan 27, 2022Updated 4 years ago
- Language Modeling with the H3 State Space Model☆522Sep 29, 2023Updated 2 years ago
- Official Repository for Efficient Linear-Time Attention Transformers.☆18Jun 2, 2024Updated last year
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆50Jun 16, 2023Updated 2 years ago
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆135Oct 15, 2025Updated 5 months ago
- Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch☆804Jan 30, 2026Updated last month
- Implementation of a U-net complete with efficient attention as well as the latest research findings☆292May 3, 2024Updated last year