lucidrains / block-recurrent-transformer-pytorchLinks

Implementation of Block Recurrent Transformer - Pytorch

☆223

Alternatives and similar repositories for block-recurrent-transformer-pytorch

Users that are interested in block-recurrent-transformer-pytorch are comparing it to the libraries listed below

Sorting:

lucidrains / recurrent-memory-transformer-pytorch
Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch
☆420Updated 11 months ago
lucidrains / simple-hierarchical-transformer
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
☆224Updated last year
lucidrains / CoLT5-attention
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
☆230Updated last year
lucidrains / Mega-pytorch
Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
☆207Updated 2 years ago
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆113Updated 11 months ago
lucidrains / pytorch-custom-utils
Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…
☆125Updated last year
lucidrains / mixture-of-attention
Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
☆120Updated last year
lucidrains / memory-efficient-attention-pytorch
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"
☆384Updated 2 years ago
lucidrains / gated-state-spaces-pytorch
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆101Updated 2 years ago
fkodom / yet-another-retnet
A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (http…
☆106Updated 2 years ago
lucidrains / flash-cosine-sim-attention
Implementation of fused cosine similarity attention in the same style as Flash Attention
☆219Updated 2 years ago
facebookresearch / mega
Sequence modeling with Mega.
☆301Updated 2 years ago
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆91Updated last year
booydar / LM-RMT
Recurrent Memory Transformer
☆154Updated 2 years ago
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆89Updated last year
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆100Updated last year
lucidrains / adam-atan2-pytorch
Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch
☆134Updated last month
lucidrains / st-moe-pytorch
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
☆373Updated last year
lucidrains / llama-qrlhf
Implementation of the Llama architecture with RLHF + Q-learning
☆168Updated 10 months ago
lucidrains / soft-moe-pytorch
Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch
☆337Updated 8 months ago
bobby-he / simplified_transformers
☆293Updated 11 months ago
google-research / meliad
☆259Updated 6 months ago
lucidrains / agent-attention-pytorch
Implementation of Agent Attention in Pytorch
☆92Updated last year
ctlllll / SGConv
☆164Updated 2 years ago
syncdoth / RetNet
Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent,…
☆227Updated last year
lucidrains / Adan-pytorch
Implementation of the Adan (ADAptive Nesterov momentum algorithm) Optimizer in Pytorch
☆252Updated 3 years ago
lucidrains / PaLM-jax
Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)
☆189Updated 3 years ago
kyegomez / Sophia
Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
☆384Updated last year
srush / do-we-need-attention
☆166Updated 2 years ago
facebookresearch / dropout
Code release for "Dropout Reduces Underfitting"
☆317Updated 2 years ago