lucidrains / coordinate-descent-attentionView external linksLinks
Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk
☆47Jul 16, 2023Updated 2 years ago
Alternatives and similar repositories for coordinate-descent-attention
Users that are interested in coordinate-descent-attention are comparing it to the libraries listed below
Sorting:
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆123Oct 17, 2024Updated last year
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆46May 23, 2023Updated 2 years ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Oct 22, 2023Updated 2 years ago
- Implementation of a holodeck, written in Pytorch☆18Nov 1, 2023Updated 2 years ago
- Implementation of Strassen attention, from Kozachinskiy et al. of National Center of AI in Chile☆41Jul 8, 2025Updated 7 months ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- Explorations into the recently proposed Taylor Series Linear Attention☆100Aug 18, 2024Updated last year
- Implementation of Block Recurrent Transformer - Pytorch☆224Aug 20, 2024Updated last year
- Implementation of the transformer proposed in "Building Blocks for a Complex-Valued Transformer Architecture"☆88Oct 13, 2023Updated 2 years ago
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆220Feb 13, 2023Updated 3 years ago
- Exploring finetuning public checkpoints on filter 8K sequences on Pile☆115Mar 22, 2023Updated 2 years ago
- Implementation of Denoising Diffusion for protein design, but using the new Equiformer (successor to SE3 Transformers) with some addition…☆57Dec 27, 2022Updated 3 years ago
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆135Oct 15, 2025Updated 4 months ago
- Thispersondoesnotexist went down, so this time, while building it back up, I am going to open source all of it.☆91Aug 26, 2023Updated 2 years ago
- ☆11Nov 7, 2024Updated last year
- Fine-tune copilot based on your codebase☆12Mar 26, 2024Updated last year
- ☆11Oct 11, 2023Updated 2 years ago
- Implementation of Hyena Hierarchy in JAX☆10Apr 30, 2023Updated 2 years ago
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta☆13Nov 11, 2024Updated last year
- source code for NAACL2022 main conference "Dynamic Programming in Rank Space: Scaling Structured Inference with Low-Rank HMMs and PCFGs"☆10Sep 26, 2022Updated 3 years ago
- Tools to isolate speaker and transcribe unstructured audio clips☆11Dec 4, 2022Updated 3 years ago
- Julia implementation of flash-attention operation for neural networks.☆11May 31, 2023Updated 2 years ago
- Implementation of Insertion-deletion Denoising Diffusion Probabilistic Models☆30May 31, 2022Updated 3 years ago
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆15Mar 11, 2024Updated last year
- ☆13Dec 15, 2025Updated 2 months ago
- Recursive Bayesian Networks☆11May 11, 2025Updated 9 months ago
- lanmt ebm☆12Jun 19, 2020Updated 5 years ago
- ☆13Jun 16, 2021Updated 4 years ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆49Jan 27, 2022Updated 4 years ago
- An implementation of (Induced) Set Attention Block, from the Set Transformers paper☆67Jan 10, 2023Updated 3 years ago
- Implementation of Recurrent Interface Network (RIN), for highly efficient generation of images and video without cascading networks, in P…☆207Feb 14, 2024Updated 2 years ago
- [EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…☆14Oct 17, 2023Updated 2 years ago
- Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces, NeurIPS 2021☆14Dec 11, 2021Updated 4 years ago
- ☆13Feb 7, 2023Updated 3 years ago
- The ISA specification for the ZiCondOps extension.☆19Mar 21, 2024Updated last year
- My explorations into editing the knowledge and memories of an attention network☆35Dec 8, 2022Updated 3 years ago
- Code for "Unsupervised Visuomotor Control through Distributional Planning Networks"☆10Jun 27, 2019Updated 6 years ago
- The Concept Bottleneck Shift Detection (CBSD) methods for explaining and detecting various dataset shifts.☆14Jun 22, 2021Updated 4 years ago
- Multidimensional indexing for tensors☆137Jul 17, 2023Updated 2 years ago