Doraemonzzz / tnn-pytorchLinks

☆20

Alternatives and similar repositories for tnn-pytorch

Users that are interested in tnn-pytorch are comparing it to the libraries listed below

Sorting:

lsj2408 / URPE
[NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)
☆34Updated 2 years ago
radarFudan / Curse-of-memory
Curse-of-memory phenomenon of RNNs in sequence modelling
☆19Updated 3 months ago
sustcsonglin / gated_linear_attention_layer
☆32Updated last year
OpenNLPLab / Tnn
[ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling
☆79Updated last year
maximzubkov / fft-scan
Efficient PScan implementation in PyTorch
☆16Updated last year
sustcsonglin / mamba-triton
☆49Updated last year
OpenNLPLab / HGRN
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆66Updated last year
srush / mamba-scans
Blog post
☆17Updated last year
HazyResearch / prefix-linear-attention
☆55Updated last year
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆55Updated 11 months ago
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
OpenNLPLab / ETSC-Exact-Toeplitz-to-SSM-Conversion
[EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…
☆14Updated last year
glassroom / heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Updated last year
Doraemonzzz / hgru-pytorch
☆27Updated last year
kazuki-irie / kv-memory-brain
Official Code Repository for the paper "Key-value memory in the brain"
☆27Updated 5 months ago
microsoft / EfficientLongSequenceModeling
☆51Updated 2 years ago
RakitinDen / pytorch-recursive-gumbel-max-trick
Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces, NeurIPS 2021
☆13Updated 3 years ago
automl / unlocking_state_tracking
Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…
☆15Updated 4 months ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆62Updated last year
Eliyas0007 / Pytorch-Intention
Unofficial implementation of paper : Exploring the Space of Key-Value-Query Models with Intention
☆12Updated 2 years ago
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆44Updated 10 months ago
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆40Updated last year
Benjamin-Walker / selective-ssms-and-linear-cdes
Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)
☆15Updated 7 months ago
yikangshen / megablocks
☆20Updated last year
Doraemonzzz / hgru2-pytorch
☆23Updated 10 months ago
RobertCsordas / moe_layer
sigma-MoE layer
☆20Updated last year
tml-epfl / why-weight-decay
Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]
☆66Updated 10 months ago
Doraemonzzz / xmixers
Xmixers: A collection of SOTA efficient token/channel mixers
☆11Updated last month
acosharma / elita-transformer
Official Repository for Efficient Linear-Time Attention Transformers.
☆18Updated last year