kyegomez / SparseAttentionLinks

Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"

☆92

Alternatives and similar repositories for SparseAttention

Users that are interested in SparseAttention are comparing it to the libraries listed below

Sorting:

kyegomez / Griffin
Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"
☆56Updated 3 weeks ago
kyegomez / MoE-Mamba
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…
☆115Updated last month
lucidrains / agent-attention-pytorch
Implementation of Agent Attention in Pytorch
☆92Updated last year
transformer-vq / transformer_vq
☆199Updated last year
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆113Updated 10 months ago
lucidrains / multimodal-dit-pytorch
Implementation of a multimodal diffusion transformer in Pytorch
☆106Updated last year
WailordHe / DenseSSM
A repository for DenseSSMs
☆89Updated last year
lucidrains / hyper-connections
Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public
☆91Updated 5 months ago
zhixuan-lin / forgetting-transformer
[ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning
☆133Updated 2 weeks ago
NVlabs / GatedDeltaNet
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆376Updated 2 months ago
test-time-training / ttt-lm-kernels
Inference Speed Benchmark for Learning to (Learn at Test Time): RNNs with Expressive Hidden States
☆74Updated last year
fkodom / yet-another-retnet
A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (http…
☆106Updated last year
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆100Updated last year
kyegomez / Jamba
PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"
☆195Updated 3 weeks ago
yu-rp / KANbeFair
A More Fair and Comprehensive Comparison between KAN and MLP
☆176Updated last year
fla-org / flash-bidirectional-linear-attention
Triton implement of bi-directional (non-causal) linear attention
☆56Updated 9 months ago
goombalab / hydra
Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"
☆165Updated 9 months ago
Adamdad / rational_kat_cu
☆76Updated 9 months ago
lucidrains / mixture-of-attention
Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
☆119Updated last year
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆162Updated 7 months ago
pengzhangzhi / Awesome-Mamba
Awesome list of papers that extend Mamba to various applications.
☆138Updated 5 months ago
lucidrains / simplicial-attention
Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…
☆47Updated 2 months ago
kyegomez / FlashAttention20
Get down and dirty with FlashAttention2.0 in pytorch, plug in and play no complex CUDA kernels
☆111Updated 2 years ago
lucidrains / adam-atan2-pytorch
Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch
☆134Updated last month
thu-ml / low-bit-optimizers
Low-bit optimizers for PyTorch
☆132Updated 2 years ago
kyegomez / Mixture-of-Depths
Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆110Updated this week
jxiw / MambaInLlama
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
☆231Updated last month
TsinghuaC3I / Fourier-Position-Embedding
[ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization
☆103Updated 5 months ago
knotgrass / attention
several types of attention modules written in PyTorch for learning purposes
☆52Updated last year
lucidrains / deep-cross-attention
Implementation of the proposed DeepCrossAttention by Heddes et al at Google research, in Pytorch
☆94Updated 8 months ago