syncdoth/RetNet

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/syncdoth/RetNet)

syncdoth / RetNet

Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent, and chunkwise forward.

☆227

Alternatives and similar repositories for RetNet

Users that are interested in RetNet are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

fkodom / yet-another-retnet
View on GitHub
A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (http…
☆105Nov 24, 2023Updated 2 years ago
Jamie-Stirling / RetNet
View on GitHub
An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"
☆1,211Oct 22, 2023Updated 2 years ago
prateekstark / retnet
View on GitHub
☆14Jul 26, 2023Updated 2 years ago
ShaderManager / RetNet
View on GitHub
PyTorch implementation of Retentive Network: A Successor to Transformer for Large Language Models
☆14Jul 20, 2023Updated 3 years ago
RL10x / RetNet
View on GitHub
an implementation of paper"Retentive Network: A Successor to Transformer for Large Language Models" https://arxiv.org/pdf/2307.08621.pdf
☆11Jul 25, 2023Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
IBM / selective-dense-state-space-model
View on GitHub
Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …
☆16Sep 18, 2025Updated 10 months ago
microsoft / torchscale
View on GitHub
Foundation Architecture for (M)LLMs
☆3,133Apr 11, 2024Updated 2 years ago
jiaowoguanren0615 / RetNet_ViT-RMT-
View on GitHub
☆34Jan 9, 2024Updated 2 years ago
sustcsonglin / mamba-triton
View on GitHub
☆52Jan 28, 2024Updated 2 years ago
Cranial-XIX / longhorn
View on GitHub
Official PyTorch Implementation of the Longhorn Deep State Space Model
☆57Dec 4, 2024Updated last year
HKUST-KnowComp / PseudoReasoner
View on GitHub
Official code repository for Findings of EMNLP 2022 paper: PseudoReasoner: Leveraging Pseudo Labels for Commonsense Knowledge Base Popula…
☆11Oct 18, 2022Updated 3 years ago
johanwind / wind_rwkv
View on GitHub
☆27Feb 26, 2026Updated 4 months ago
lucidrains / gateloop-transformer
View on GitHub
Implementation of GateLoop Transformer in Pytorch and Jax
☆92Jun 18, 2024Updated 2 years ago
Hannibal046 / GridTST
View on GitHub
Source code for Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers
☆19May 29, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
WailordHe / DenseSSM
View on GitHub
A repository for DenseSSMs
☆90Apr 11, 2024Updated 2 years ago
HKUST-KnowComp / MICO
View on GitHub
This is the code repo for Findings of EMNLP2022 paper: MICO: a multi-alternative contrastive learning framework for commonsense knowledg…
☆10Nov 29, 2022Updated 3 years ago
berlino / seq_icl
View on GitHub
☆54May 20, 2024Updated 2 years ago
OpenMOSE / RWKV-Infer
View on GitHub
A large-scale RWKV v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy…
☆51Oct 21, 2025Updated 8 months ago
berlino / gated_linear_attention
View on GitHub
☆107Mar 9, 2024Updated 2 years ago
alexisrozhkov / dilated-self-attention
View on GitHub
Implementation of the dilated self attention as described in "LongNet: Scaling Transformers to 1,000,000,000 Tokens"
☆13Jul 23, 2023Updated 2 years ago
myscience / retnet-pytorch
View on GitHub
Implementation of Retention-Network in PyTorch
☆17Aug 12, 2023Updated 2 years ago
HazyResearch / prefix-linear-attention
View on GitHub
☆62Jul 9, 2024Updated 2 years ago
HKUST-KnowComp / SubeventWriter
View on GitHub
Official code repository for the main conference paper in EMNLP 2022: SubeventWriter: Iterative Sub-event Sequence Generation with Cohere…
☆11Oct 16, 2022Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
Doraemonzzz / hgru-pytorch
View on GitHub
☆29Jul 9, 2024Updated 2 years ago
BlinkDL / LinearAttentionArena
View on GitHub
Here we will test various linear attention designs.
☆62Apr 25, 2024Updated 2 years ago
LegallyCoder / mamba-hf
View on GitHub
Implementation of the Mamba SSM with hf_integration.
☆55Aug 31, 2024Updated last year
EleutherAI / rnngineering
View on GitHub
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆33May 25, 2024Updated 2 years ago
OpenNLPLab / HGRN
View on GitHub
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆68Apr 24, 2024Updated 2 years ago
acosharma / elita-transformer
View on GitHub
Official Repository for Efficient Linear-Time Attention Transformers.
☆18Jun 2, 2024Updated 2 years ago
renll / SeqBoat
View on GitHub
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆40Dec 2, 2023Updated 2 years ago
Benjamin-Walker / selective-ssms-and-linear-cdes
View on GitHub
Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)
☆17Jan 7, 2025Updated last year
sjelassi / transformers_ssm_copy
View on GitHub
☆40Feb 26, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
bdusell / stack-attention
View on GitHub
Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
☆18Mar 15, 2024Updated 2 years ago
test-time-training / ttt-tk
View on GitHub
☆45Nov 1, 2025Updated 8 months ago
kenkenpa2126 / vanilla_transformer_from_scratch_with_JAX
View on GitHub
☆10Dec 18, 2023Updated 2 years ago
maximzubkov / fft-scan
View on GitHub
Efficient PScan implementation in PyTorch
☆17Jan 2, 2024Updated 2 years ago
automl / DeltaProduct
View on GitHub
DeltaProduct is a new linear recurrent neural network architecture that uses products of generalized Householder matrices as state-transi…
☆15Oct 13, 2025Updated 9 months ago
emalach / LinearLM
View on GitHub
Code for the paper: https://arxiv.org/pdf/2309.06979.pdf
☆21Jul 29, 2024Updated last year
Noahs-ARK / PaLM
View on GitHub
PyTorch implementation for PaLM: A Hybrid Parser and Language Model.
☆10Jan 7, 2020Updated 6 years ago