gmongaras / Cottention_TransformerLinks

Code for the paper "Cottention: Linear Transformers With Cosine Attention"

☆17

Alternatives and similar repositories for Cottention_Transformer

Users that are interested in Cottention_Transformer are comparing it to the libraries listed below

Sorting:

OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆55Updated 11 months ago
OpenNLPLab / HGRN
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆66Updated last year
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆62Updated last year
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆40Updated last year
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
kazuki-irie / kv-memory-brain
Official Code Repository for the paper "Key-value memory in the brain"
☆27Updated 5 months ago
howard-hou / RWKV-X
RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…
☆46Updated 2 weeks ago
Doraemonzzz / Awesome-Triton-Resources
Awesome Triton Resources
☆32Updated 3 months ago
GATECH-EIC / Linearized-LLM
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆33Updated last year
HazyResearch / prefix-linear-attention
☆56Updated last year
renll / SeqBoat
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆37Updated last year
RobertCsordas / moeut
☆83Updated 11 months ago
sustcsonglin / mamba-triton
☆49Updated last year
piotrpiekos / MoSA
User-friendly implementation of the Mixture-of-Sparse-Attention (MoSA). MoSA selects distinct tokens for each head with expert choice rou…
☆23Updated 3 months ago
glassroom / heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Updated last year
fla-org / flash-bidirectional-linear-attention
Triton implement of bi-directional (non-causal) linear attention
☆52Updated 6 months ago
chijames / KERPLE
☆19Updated 2 years ago
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆110Updated 7 months ago
recursal / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆46Updated last year
Doraemonzzz / hgru2-pytorch
☆23Updated 10 months ago
zhixuan-lin / forgetting-transformer
[ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"
☆118Updated last month
Doraemonzzz / hgru-pytorch
☆27Updated last year
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
sustcsonglin / gated_linear_attention_layer
☆32Updated last year
thunlp / SparsingLaw
The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".
☆23Updated 8 months ago
fla-org / fla-zoo
Flash-Linear-Attention models beyond language
☆16Updated 2 weeks ago
Doraemonzzz / xmixers
Xmixers: A collection of SOTA efficient token/channel mixers
☆11Updated 3 weeks ago
sjelassi / transformers_ssm_copy
☆33Updated last year
assafbk / DeciMamba
DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)
☆28Updated 3 months ago
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆44Updated 8 months ago