lucidrains/gateloop-transformer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lucidrains/gateloop-transformer)

lucidrains / gateloop-transformer

Implementation of GateLoop Transformer in Pytorch and Jax

☆93

Alternatives and similar repositories for gateloop-transformer

Users that are interested in gateloop-transformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lucidrains / mixture-of-attention
View on GitHub
Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
☆122Oct 17, 2024Updated last year
lucidrains / coordinate-descent-attention
View on GitHub
Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk
☆47Jul 16, 2023Updated 3 years ago
lucidrains / autoregressive-linear-attention-cuda
View on GitHub
CUDA implementation of autoregressive linear attention, with all the latest research findings
☆46May 23, 2023Updated 3 years ago
lucidrains / gated-state-spaces-pytorch
View on GitHub
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆101Feb 25, 2023Updated 3 years ago
OpenNLPLab / HGRN
View on GitHub
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆68Apr 24, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
lucidrains / kalman-filtering-attention
View on GitHub
Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"
☆61Oct 22, 2023Updated 2 years ago
lucidrains / taylor-series-linear-attention
View on GitHub
Explorations into the recently proposed Taylor Series Linear Attention
☆101Aug 18, 2024Updated last year
lucidrains / product-key-memory
View on GitHub
Standalone Product Key Memory module in Pytorch - for augmenting Transformer models
☆87Nov 1, 2025Updated 8 months ago
bdusell / stack-attention
View on GitHub
Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
☆18Mar 15, 2024Updated 2 years ago
lucidrains / llama-qrlhf
View on GitHub
Implementation of the Llama architecture with RLHF + Q-learning
☆170Feb 1, 2025Updated last year
lucidrains / mirasol-pytorch
View on GitHub
Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch
☆92Dec 22, 2023Updated 2 years ago
tobiaskatsch / GatedLinearRNN
View on GitHub
☆30Feb 27, 2024Updated 2 years ago
lucidrains / pause-transformer
View on GitHub
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆53Oct 22, 2023Updated 2 years ago
Doraemonzzz / nanoTransNormer
View on GitHub
☆11Oct 11, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
robert-lieck / RBN
View on GitHub
Recursive Bayesian Networks
☆11May 11, 2025Updated last year
lucidrains / infini-transformer-pytorch
View on GitHub
Implementation of Infini-Transformer in Pytorch
☆112Jan 4, 2025Updated last year
lucidrains / flash-genomics-model
View on GitHub
My own attempt at a long context genomics model, leveraging recent advances in long context attention modeling (Flash Attention + other h…
☆54Jul 2, 2023Updated 3 years ago
lucidrains / pytorch-custom-utils
View on GitHub
Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…
☆126Jul 26, 2024Updated 2 years ago
lucidrains / CoLT5-attention
View on GitHub
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
☆230Sep 6, 2024Updated last year
johanwind / wind_rwkv
View on GitHub
☆27Feb 26, 2026Updated 5 months ago
jenni-ai / T2FW
View on GitHub
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
☆20Oct 9, 2022Updated 3 years ago
Doraemonzzz / hgru-pytorch
View on GitHub
☆29Jul 9, 2024Updated 2 years ago
glassroom / heinsen_attention
View on GitHub
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆25Jun 6, 2024Updated 2 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
lucidrains / holodeck-pytorch
View on GitHub
Implementation of a holodeck, written in Pytorch
☆19Nov 1, 2023Updated 2 years ago
proger / accelerated-scan
View on GitHub
Accelerated First Order Parallel Associative Scan
☆198Jan 7, 2026Updated 6 months ago
irhum / hyena
View on GitHub
JAX/Flax implementation of the Hyena Hierarchy
☆35Apr 27, 2023Updated 3 years ago
yikangshen / megablocks
View on GitHub
☆20May 30, 2024Updated 2 years ago
lucidrains / quartic-transformer
View on GitHub
Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)
☆56Mar 25, 2025Updated last year
NousResearch / StripedHyenaTrainer
View on GitHub
☆67Dec 8, 2023Updated 2 years ago
emalach / LinearLM
View on GitHub
Code for the paper: https://arxiv.org/pdf/2309.06979.pdf
☆21Jul 29, 2024Updated last year
HazyResearch / prefix-linear-attention
View on GitHub
☆62Jul 9, 2024Updated 2 years ago
lucidrains / rvq-vae-gpt
View on GitHub
My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation
☆90Oct 11, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
acosharma / elita-transformer
View on GitHub
Official Repository for Efficient Linear-Time Attention Transformers.
☆17Jun 2, 2024Updated 2 years ago
lucidrains / blackbox-gradient-sensing
View on GitHub
Implementation and explorations into Blackbox Gradient Sensing (BGS), an evolutionary strategies approach proposed in a Google Deepmind p…
☆20Apr 17, 2026Updated 3 months ago
proger / nanokitchen
View on GitHub
Parallel Associative Scan for Language Models
☆18Jan 8, 2024Updated 2 years ago
lucidrains / block-recurrent-transformer-pytorch
View on GitHub
Implementation of Block Recurrent Transformer - Pytorch
☆226Aug 20, 2024Updated last year
OpenNLPLab / ETSC-Exact-Toeplitz-to-SSM-Conversion
View on GitHub
[EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…
☆14Oct 17, 2023Updated 2 years ago
epfml / pam
View on GitHub
☆16Dec 9, 2023Updated 2 years ago
lucidrains / adam-atan2-pytorch
View on GitHub
Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch
☆143Jul 17, 2026Updated last week