berlino/gated_linear_attention

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/berlino/gated_linear_attention)

berlino / gated_linear_attention

☆107

Alternatives and similar repositories for gated_linear_attention

Users that are interested in gated_linear_attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OpenNLPLab / HGRN
View on GitHub
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆68Apr 24, 2024Updated 2 years ago
sustcsonglin / mamba-triton
View on GitHub
☆52Jan 28, 2024Updated 2 years ago
dtunai / Griffin-Jax
View on GitHub
Jax implementation of "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"
☆15May 10, 2024Updated 2 years ago
BlinkDL / LinearAttentionArena
View on GitHub
Here we will test various linear attention designs.
☆62Apr 25, 2024Updated 2 years ago
proger / hippogriff
View on GitHub
Griffin MQA + Hawk Linear RNN Hybrid
☆89Apr 13, 2026Updated 3 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
OpenNLPLab / HGRN2
View on GitHub
HGRN2: Gated Linear RNNs with State Expansion
☆58Aug 20, 2024Updated last year
johanwind / wind_rwkv
View on GitHub
☆27Feb 26, 2026Updated 5 months ago
Doraemonzzz / hgru2-pytorch
View on GitHub
☆24Sep 25, 2024Updated last year
glassroom / heinsen_attention
View on GitHub
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆25Jun 6, 2024Updated 2 years ago
bdusell / stack-attention
View on GitHub
Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
☆18Mar 15, 2024Updated 2 years ago
Cranial-XIX / longhorn
View on GitHub
Official PyTorch Implementation of the Longhorn Deep State Space Model
☆57Dec 4, 2024Updated last year
lxxue / prefix_sum
View on GitHub
A PyTorch wrapper of parallel exclusive scan in CUDA
☆12May 25, 2023Updated 3 years ago
WailordHe / DenseSSM
View on GitHub
A repository for DenseSSMs
☆90Apr 11, 2024Updated 2 years ago
sustcsonglin / gated_linear_attention_layer
View on GitHub
☆32Jan 7, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
00ffcc / chunkRWKV6
View on GitHub
continous batching and parallel acceleration for RWKV6
☆23Jun 28, 2024Updated 2 years ago
proger / accelerated-scan
View on GitHub
Accelerated First Order Parallel Associative Scan
☆198Jan 7, 2026Updated 6 months ago
yikangshen / megablocks
View on GitHub
☆20May 30, 2024Updated 2 years ago
OpenNLPLab / TransnormerLLM
View on GitHub
Official implementation of TransNormerLLM: A Faster and Better LLM
☆256Jan 23, 2024Updated 2 years ago
jopetty / word-problem
View on GitHub
Experiments on the impact of depth in transformers and SSMs.
☆44Oct 23, 2025Updated 9 months ago
DCGM / SoftCTC
View on GitHub
This repository contains source codes for SoftCTC. Original paper can be found here: https://arxiv.org/abs/2212.02135
☆19Mar 7, 2023Updated 3 years ago
machine-discovery / deer
View on GitHub
Parallelizing non-linear sequential models over the sequence length
☆57Jun 23, 2025Updated last year
radarFudan / mamba-minimal-jax
View on GitHub
☆36Nov 22, 2024Updated last year
RobertCsordas / moe_layer
View on GitHub
sigma-MoE layer
☆21Jan 5, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
dangxingyu / rnn-icrag
View on GitHub
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Apr 17, 2024Updated 2 years ago
tau-nlp / scrolls
View on GitHub
The official code of EMNLP 2022, "SCROLLS: Standardized CompaRison Over Long Language Sequences".
☆69Jan 12, 2024Updated 2 years ago
kyegomez / Griffin
View on GitHub
Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"
☆58Oct 27, 2025Updated 9 months ago
Doraemonzzz / xmixers
View on GitHub
Xmixers: A collection of SOTA efficient token/channel mixers
☆29Sep 4, 2025Updated 10 months ago
HazyResearch / based
View on GitHub
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆256Jun 6, 2025Updated last year
fla-org / flash-linear-attention
View on GitHub
🚀 Efficient implementations for emerging model architectures
☆5,425Updated this week
kazuki-irie / kv-memory-brain
View on GitHub
Official Code Repository for the paper "Key-value memory in the brain"
☆32Feb 25, 2025Updated last year
EleutherAI / rnngineering
View on GitHub
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆33May 25, 2024Updated 2 years ago
tech-srl / layer_norm_expressivity_role
View on GitHub
Code for the paper "On the Expressivity Role of LayerNorm in Transformers' Attention" (Findings of ACL'2023)
☆60Sep 27, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
sustcsonglin / disco-pointer
View on GitHub
Official Implementation of ACL2023: Don't Parse, Choose Spans! Continuous and Discontinuous Constituency Parsing via Autoregressive Span …
☆14Aug 25, 2023Updated 2 years ago
HazyResearch / zoology
View on GitHub
Understand and test language model architectures on synthetic tasks.
☆278Mar 22, 2026Updated 4 months ago
lucaslingle / transformer_vq
View on GitHub
Official implementation of 'Transformer-VQ: Linear-Time Transformers via Vector Quantization' (ICLR 2024)
☆199Dec 4, 2023Updated 2 years ago
AntNLP / nope_head_scale
View on GitHub
☆29May 4, 2024Updated 2 years ago
shawntan / stickbreaking-attention
View on GitHub
Stick-breaking attention
☆63Jul 1, 2025Updated last year
lucidrains / gateloop-transformer
View on GitHub
Implementation of GateLoop Transformer in Pytorch and Jax
☆93Jun 18, 2024Updated 2 years ago
microsoft / EfficientLongSequenceModeling
View on GitHub
☆54Jan 19, 2023Updated 3 years ago