lucidrains / rotary-embedding-torchLinks

Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch

☆719

Alternatives and similar repositories for rotary-embedding-torch

Users that are interested in rotary-embedding-torch are comparing it to the libraries listed below

Sorting:

lucidrains / local-attention
An implementation of local windowed attention for language modeling
☆466Updated 2 weeks ago
lucidrains / memory-efficient-attention-pytorch
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"
☆379Updated 2 years ago
lucidrains / linear-attention-transformer
Transformer based on a variant of attention that is linear complexity in respect to sequence length
☆789Updated last year
lucidrains / ema-pytorch
A simple way to keep track of an Exponential Moving Average (EMA) version of your Pytorch model
☆600Updated 7 months ago
lucidrains / mixture-of-experts
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
☆785Updated last year
pytorch-labs / attention-gym
Helpful tools and examples for working with flex-attention
☆904Updated 2 weeks ago
lucidrains / st-moe-pytorch
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
☆351Updated last year
ofirpress / attention_with_linear_biases
Code for the ALiBi method for transformer language models (ICLR 2022)
☆538Updated last year
lucidrains / linformer
Implementation of Linformer for Pytorch
☆294Updated last year
srush / annotated-mamba
Annotated version of the Mamba paper
☆487Updated last year
tatp22 / multidim-positional-encoding
An implementation of 1D, 2D, and 3D positional encoding in Pytorch and TensorFlow
☆600Updated 9 months ago
Haiyang-W / TokenFormer
[ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
☆567Updated 5 months ago
changjonathanc / minLoRA
minLoRA: a minimal PyTorch library that allows you to apply LoRA to any PyTorch model.
☆468Updated 2 years ago
lucidrains / FLASH-pytorch
Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"
☆368Updated last year
lucidrains / ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆532Updated 2 months ago
lucidrains / block-recurrent-transformer-pytorch
Implementation of Block Recurrent Transformer - Pytorch
☆221Updated 11 months ago
idiap / fast-transformers
Pytorch library for fast transformer implementations
☆1,725Updated 2 years ago
srush / annotated-s4
Implementation of https://srush.github.io/annotated-s4
☆500Updated last month
lucidrains / minGRU-pytorch
Implementation of the proposed minGRU in Pytorch
☆300Updated 4 months ago
facebookresearch / mega
Sequence modeling with Mega.
☆296Updated 2 years ago
test-time-training / ttt-lm-jax
Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
☆418Updated 11 months ago
lucidrains / nGPT-pytorch
Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI
☆288Updated last month
lucidrains / recurrent-memory-transformer-pytorch
Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch
☆412Updated 6 months ago
krasserm / perceiver-io
A PyTorch implementation of Perceiver, Perceiver IO and Perceiver AR with PyTorch Lightning scripts for distributed training
☆486Updated last year
lucidrains / performer-pytorch
An implementation of Performer, a linear attention-based transformer, in Pytorch
☆1,139Updated 3 years ago
naver-ai / rope-vit
[ECCV 2024] Official PyTorch implementation of RoPE-ViT "Rotary Position Embedding for Vision Transformer"
☆371Updated 7 months ago
sail-sg / Adan
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
☆797Updated last month
kozistr / pytorch_optimizer
optimizer & lr scheduler & loss function collections in PyTorch
☆318Updated this week
kyleliang919 / C-Optim
When it comes to optimizers, it's always better to be safe than sorry
☆336Updated 2 weeks ago
google-research / long-range-arena
Long Range Arena for Benchmarking Efficient Transformers
☆762Updated last year