DonRL10 / RetNetLinks

an implementation of paper"Retentive Network: A Successor to Transformer for Large Language Models" https://arxiv.org/pdf/2307.08621.pdf

☆12

Alternatives and similar repositories for RetNet

Users that are interested in RetNet are comparing it to the libraries listed below

Sorting:

OpenNLPLab / HGRN
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆65Updated last year
HazyResearch / prefix-linear-attention
☆56Updated last year
deep-spin / infinite-former
☆67Updated last year
juvi21 / CoPE-cuda
Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719
☆22Updated last year
sustcsonglin / mamba-triton
☆48Updated last year
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆89Updated last year
maximzubkov / fft-scan
Efficient PScan implementation in PyTorch
☆16Updated last year
SmerkyG / RWKV_Explained
RWKV, in easy to read code
☆72Updated 6 months ago
sustcsonglin / gated_linear_attention_layer
☆31Updated last year
dtunai / Griffin-Jax
Jax implementation of "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"
☆14Updated last year
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆90Updated last year
google-deepmind / randomized_positional_encodings
Randomized Positional Encodings Boost Length Generalization of Transformers
☆82Updated last year
prateekstark / retnet
☆14Updated 2 years ago
McGill-NLP / length-generalization
Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023
☆135Updated last year
siyuanseever / llama2Rnn.c
☆12Updated last year
jopetty / word-problem
Experiments on the impact of depth in transformers and SSMs.
☆34Updated 11 months ago
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆54Updated last year
OpenMOSE / RWKV-Infer
A large-scale RWKV v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy…
☆44Updated 3 weeks ago
YihongDong / FANformer
☆33Updated 4 months ago
johanwind / wind_rwkv
☆26Updated 2 months ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
Doraemonzzz / hgru2-pytorch
☆23Updated last year
Doraemonzzz / hgru-pytorch
☆27Updated last year
PiotrNawrot / dynamic-pooling
Efficient Transformers with Dynamic Token Pooling
☆64Updated 2 years ago
dashstander / block-recurrent-transformer
Pytorch implementation of "Block Recurrent Transformers" (Hutchins & Schlag et al., 2022)
☆85Updated 3 years ago
Benjamin-Walker / selective-ssms-and-linear-cdes
Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)
☆15Updated 9 months ago
AntNLP / nope_head_scale
☆26Updated last year
kazuki-irie / kv-memory-brain
Official Code Repository for the paper "Key-value memory in the brain"
☆28Updated 7 months ago
expz / annotated-hyena
An annotated implementation of the Hyena Hierarchy paper
☆34Updated 2 years ago
acosharma / elita-transformer
Official Repository for Efficient Linear-Time Attention Transformers.
☆18Updated last year