bdusell / stack-attentionLinks

Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"

☆17

Alternatives and similar repositories for stack-attention

Users that are interested in stack-attention are comparing it to the libraries listed below

Sorting:

jenni-ai / T2FW
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
☆19Updated 2 years ago
yikangshen / megablocks
☆20Updated last year
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆40Updated last year
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
ermongroup / fast_feedforward_computation
Official code for "Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving", ICML 2021
☆27Updated 3 years ago
Doraemonzzz / nanoTransNormer
☆11Updated last year
radarFudan / Curse-of-memory
Curse-of-memory phenomenon of RNNs in sequence modelling
☆19Updated 3 months ago
glassroom / heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Updated last year
sustcsonglin / gated_linear_attention_layer
☆32Updated last year
Eliyas0007 / Pytorch-Intention
Unofficial implementation of paper : Exploring the Space of Key-Value-Query Models with Intention
☆12Updated 2 years ago
srush / tangent
Source-to-Source Debuggable Derivatives in Pure Python
☆15Updated last year
RobertCsordas / linear_layer_as_attention
The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …
☆16Updated last month
UW-Madison-Lee-Lab / Expressive_Power_of_LoRA
Code for "The Expressive Power of Low-Rank Adaptation".
☆20Updated last year
Doraemonzzz / hgru2-pytorch
☆23Updated 10 months ago
IBM / selective-dense-state-space-model
Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …
☆14Updated 3 months ago
hengyuan-hu / jax-vs-pytorch
☆12Updated 5 months ago
archinetai / vat-pytorch
Virtual Adversarial Training (VAT) techniques in PyTorch
☆17Updated 3 years ago
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆55Updated 11 months ago
automl / unlocking_state_tracking
Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…
☆15Updated 4 months ago
acosharma / elita-transformer
Official Repository for Efficient Linear-Time Attention Transformers.
☆18Updated last year
Doraemonzzz / hgru-pytorch
☆27Updated last year
emalach / LinearLM
Code for the paper: https://arxiv.org/pdf/2309.06979.pdf
☆20Updated last year
yangjackie / Topics-on-diffusion-generative-models
☆26Updated last month
Doraemonzzz / xmixers
Xmixers: A collection of SOTA efficient token/channel mixers
☆11Updated last month
BYU-PCCL / prompt-compression-contrastive-coding
Companion repository to "Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models"
☆14Updated 2 years ago
Doraemonzzz / tnn-pytorch
☆20Updated 2 years ago
CyndxAI / QKNorm
Code for the paper "Query-Key Normalization for Transformers"
☆45Updated 4 years ago
VITA-Group / Data-Efficient-Scaling
[ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang
☆14Updated last year
renll / SeqBoat
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆38Updated last year
Ryu1845 / hyena-jax
Implementation of Hyena Hierarchy in JAX
☆10Updated 2 years ago