lucidrains/lookahead-keys-attention

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lucidrains/lookahead-keys-attention)

lucidrains / lookahead-keys-attention

Causal Attention with Lookahead Keys

☆28

Alternatives and similar repositories for lookahead-keys-attention

Users that are interested in lookahead-keys-attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lucidrains / RIM-pytorch
View on GitHub
Implementation of Recurrent Independent Mechanisms in Pytorch
☆27Apr 6, 2026Updated 3 months ago
lucidrains / x-evolution
View on GitHub
Implementation of various evolutionary algorithms, starting with evolutionary strategies
☆51May 10, 2026Updated 2 months ago
lucidrains / sdft-pytorch
View on GitHub
Explorations into the proposed SDFT, Self-Distillation Enables Continual Learning, from Shenfeld et al. of MIT
☆32Feb 6, 2026Updated 5 months ago
lucidrains / discrete-continuous-embed-readout
View on GitHub
Embedding and readout for simple multi-categorical and gaussian continuous
☆20Jul 5, 2026Updated 3 weeks ago
lucidrains / simplicial-attention
View on GitHub
Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…
☆49Sep 2, 2025Updated 10 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
lucidrains / fast-weight-attention
View on GitHub
Implementation of Fast Weight Attention
☆33Jun 3, 2026Updated last month
lucidrains / fast-weight-product-key-memory
View on GitHub
Implementation of the fast weight product key memory from Sakana AI
☆19Apr 1, 2026Updated 3 months ago
kerner-lab / Sparse-GPT-Pretraining
View on GitHub
A codebase for pretraining multi-billion-scale sparse GPTs.
☆24Feb 9, 2026Updated 5 months ago
lucidrains / resfit-pytorch
View on GitHub
Implementation of ResFit, Residual Off-Policy RL for Finetuning Behavior Cloning Policies
☆17Sep 29, 2025Updated 10 months ago
lucidrains / neat
View on GitHub
Explorations into NEAT and some of its derivative research
☆41Updated this week
lucidrains / simple-hierarchical-transformer
View on GitHub
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
☆228Mar 25, 2026Updated 4 months ago
lucidrains / hl-gauss-pytorch
View on GitHub
The Gaussian Histogram Loss (HL-Gauss) proposed by Imani et al. with a few convenient wrappers for regression, in Pytorch
☆78Apr 3, 2026Updated 3 months ago
lucidrains / neural-grok
View on GitHub
Explorations into the proposed NeuralGrok from Zhou et al. of EPFL
☆15Oct 18, 2025Updated 9 months ago
lucidrains / metacontroller
View on GitHub
Implementation of the MetaController proposed in "Emergent temporal abstractions in autoregressive models enable hierarchical reinforceme…
☆106Jul 8, 2026Updated 3 weeks ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Ping-C / optimizer
View on GitHub
This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…
☆39Mar 2, 2023Updated 3 years ago
ryangreenj / bioinformatics_tool_recommendation
View on GitHub
☆19Apr 25, 2023Updated 3 years ago
lucidrains / strassen-attention
View on GitHub
Implementation of Strassen attention, from Kozachinskiy et al. of National Center of AI in Chile
☆41Jul 8, 2025Updated last year
lucidrains / ultra-mem
View on GitHub
Implementation of UltraMem, improved Product Key Memory design, from Bytedance AI labs
☆28Nov 4, 2025Updated 8 months ago
rom1504 / queue_as_dataset
View on GitHub
A prototype implementation of the "dataset as a queue" pattern for processing web pages into interleaved image/text content.
☆29Nov 16, 2025Updated 8 months ago
chenjianhuii / Mechanistic-Data-Attribution
View on GitHub
☆16May 25, 2026Updated 2 months ago
lucidrains / populora
View on GitHub
Implementation and explorations into PopuLoRA, Co-Evolving LLM Populations for Reasoning Self-Play
☆15Jun 3, 2026Updated last month
lucidrains / adam-atan2-pytorch
View on GitHub
Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch
☆143Jul 17, 2026Updated last week
lucidrains / disco-rl-pytorch
View on GitHub
Implementation and explorations into DiscoRL, Discovering state-of-the-art reinforcement learning algorithms, David Silver's last work at…
☆21Jun 13, 2026Updated last month
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
lucidrains / hyper-connections
View on GitHub
Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public
☆187May 13, 2026Updated 2 months ago
lucidrains / coordinate-descent-attention
View on GitHub
Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk
☆47Jul 16, 2023Updated 3 years ago
lucidrains / x-transformers-rl
View on GitHub
Implementation of a transformer for reinforcement learning using `x-transformers`
☆72Sep 25, 2025Updated 10 months ago
lucidrains / minGRU-pytorch
View on GitHub
Implementation of the proposed minGRU in Pytorch
☆325Dec 10, 2025Updated 7 months ago
SakanaAI / fast-weight-product-key-memory
View on GitHub
Code for Fast-weight Product Key Memory (FwPKM)
☆19Mar 18, 2026Updated 4 months ago
kingsonman / homeostatic-neural-networks
View on GitHub
☆28Jul 14, 2024Updated 2 years ago
lucidrains / poly-attention
View on GitHub
Implementation of Poly-attention, a higher-order self-attention proposed by Chakrabarti et al. of Columbia
☆53Jul 22, 2026Updated last week
lucidrains / transformer-lm-gan
View on GitHub
Explorations into adversarial losses on top of autoregressive loss for language modeling
☆41Dec 21, 2025Updated 7 months ago
adalca / research-ideas
View on GitHub
☆10Feb 10, 2022Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
uvadlc / uvadlc_practicals_2022
View on GitHub
Repository for the code assignment of the Deep Learning 1 course, Fall 2022 edition
☆20Dec 9, 2022Updated 3 years ago
lucidrains / gateloop-transformer
View on GitHub
Implementation of GateLoop Transformer in Pytorch and Jax
☆93Jun 18, 2024Updated 2 years ago
lucidrains / lbm-training-framework
View on GitHub
Training framework for Large Behavioral Models
☆28Sep 17, 2025Updated 10 months ago
cunningham-lab / epi
View on GitHub
Emergent property inference.
☆11Jul 6, 2023Updated 3 years ago
kora-labs / chromax
View on GitHub
Chromax is a breeding simulator based on JAX.
☆10Jun 6, 2025Updated last year
lucidrains / taylor-series-linear-attention
View on GitHub
Explorations into the recently proposed Taylor Series Linear Attention
☆101Aug 18, 2024Updated last year
lucidrains / improving-transformers-world-model-for-rl
View on GitHub
Implementation of the new SOTA for model based RL, from the paper "Improving Transformer World Models for Data-Efficient RL", in Pytorch
☆155May 2, 2025Updated last year