lucidrains/Mega-pytorch

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lucidrains/Mega-pytorch)

lucidrains / Mega-pytorch

Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena

☆207

Alternatives and similar repositories for Mega-pytorch

Users that are interested in Mega-pytorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lucidrains / gated-state-spaces-pytorch
View on GitHub
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆101Feb 25, 2023Updated 3 years ago
lucidrains / flash-cosine-sim-attention
View on GitHub
Implementation of fused cosine similarity attention in the same style as Flash Attention
☆220Feb 13, 2023Updated 3 years ago
facebookresearch / mega
View on GitHub
Sequence modeling with Mega.
☆303Jan 28, 2023Updated 3 years ago
lucidrains / tranception-pytorch
View on GitHub
Implementation of Tranception, an attention network, paired with retrieval, that is SOTA for protein fitness prediction
☆32Jun 19, 2022Updated 4 years ago
lucidrains / token-shift-gpt
View on GitHub
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing
☆49Jan 27, 2022Updated 4 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
lucidrains / n-grammer-pytorch
View on GitHub
Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch
☆81Dec 4, 2022Updated 3 years ago
XuezheMax / fairseq-apollo
View on GitHub
FairSeq repo with Apollo optimizer
☆113Dec 20, 2023Updated 2 years ago
lucidrains / metaformer-gpt
View on GitHub
Implementation of Metaformer, but in an autoregressive manner
☆26Jun 21, 2022Updated 4 years ago
jenni-ai / T2FW
View on GitHub
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
☆20Oct 9, 2022Updated 3 years ago
lucidrains / RETRO-pytorch
View on GitHub
Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch
☆879Oct 30, 2023Updated 2 years ago
lucidrains / rvq-vae-gpt
View on GitHub
My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation
☆90Oct 11, 2024Updated last year
da03 / criticize_text_generation
View on GitHub
A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …
☆12Mar 18, 2023Updated 3 years ago
lucidrains / gateloop-transformer
View on GitHub
Implementation of GateLoop Transformer in Pytorch and Jax
☆92Jun 18, 2024Updated 2 years ago
lucidrains / recurrent-memory-transformer-pytorch
View on GitHub
Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch
☆424Jan 6, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
lucidrains / mixture-of-attention
View on GitHub
Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
☆122Oct 17, 2024Updated last year
lucidrains / multistream-transformers
View on GitHub
Implementation of Multistream Transformers in Pytorch
☆54Jul 31, 2021Updated 4 years ago
lucidrains / CoLT5-attention
View on GitHub
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
☆230Sep 6, 2024Updated last year
lucidrains / esbn-transformer
View on GitHub
An attempt to merge ESBN with Transformers, to endow Transformers with the ability to emergently bind symbols
☆16Aug 3, 2021Updated 4 years ago
lucidrains / agent-attention-pytorch
View on GitHub
Implementation of Agent Attention in Pytorch
☆93Jul 10, 2024Updated 2 years ago
lucidrains / equiformer-diffusion
View on GitHub
Implementation of Denoising Diffusion for protein design, but using the new Equiformer (successor to SE3 Transformers) with some addition…
☆56Dec 27, 2022Updated 3 years ago
lucidrains / coordinate-descent-attention
View on GitHub
Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk
☆47Jul 16, 2023Updated 3 years ago
lucidrains / memory-efficient-attention-pytorch
View on GitHub
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"
☆394Jul 18, 2023Updated 3 years ago
OpenNLPLab / ETSC-Exact-Toeplitz-to-SSM-Conversion
View on GitHub
[EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…
☆14Oct 17, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
lucidrains / insertion-deletion-ddpm
View on GitHub
Implementation of Insertion-deletion Denoising Diffusion Probabilistic Models
☆30May 31, 2022Updated 4 years ago
lucidrains / compositional-attention-pytorch
View on GitHub
Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process wi…
☆51May 10, 2022Updated 4 years ago
lucidrains / long-short-transformer
View on GitHub
Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch
☆120Aug 4, 2021Updated 4 years ago
lucidrains / jax2torch
View on GitHub
Use Jax functions in Pytorch
☆263Jul 1, 2023Updated 3 years ago
patil-suraj / vit-vqgan
View on GitHub
JAX implementation ViT-VQGAN
☆82Sep 21, 2022Updated 3 years ago
timvieira / vocrf
View on GitHub
Variable-order CRFs with structure learning
☆17Aug 1, 2024Updated last year
lucidrains / rela-transformer
View on GitHub
Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012
☆49Apr 6, 2022Updated 4 years ago
lucidrains / ETSformer-pytorch
View on GitHub
Implementation of ETSformer, state of the art time-series Transformer, in Pytorch
☆154Aug 26, 2023Updated 2 years ago
lucidrains / fast-transformer-pytorch
View on GitHub
Implementation of Fast Transformer in Pytorch
☆176Aug 26, 2021Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
lucidrains / Adan-pytorch
View on GitHub
Implementation of the Adan (ADAptive Nesterov momentum algorithm) Optimizer in Pytorch
☆251Sep 1, 2022Updated 3 years ago
automl / DeltaProduct
View on GitHub
DeltaProduct is a new linear recurrent neural network architecture that uses products of generalized Householder matrices as state-transi…
☆15Oct 13, 2025Updated 9 months ago
HazyResearch / H3
View on GitHub
Language Modeling with the H3 State Space Model
☆525Sep 29, 2023Updated 2 years ago
state-spaces / s4
View on GitHub
Structured state space sequence models
☆2,911Jul 17, 2024Updated 2 years ago
Doraemonzzz / hgru-pytorch
View on GitHub
☆29Jul 9, 2024Updated 2 years ago
lucidrains / block-recurrent-transformer-pytorch
View on GitHub
Implementation of Block Recurrent Transformer - Pytorch
☆226Aug 20, 2024Updated last year
lucidrains / autoregressive-linear-attention-cuda
View on GitHub
CUDA implementation of autoregressive linear attention, with all the latest research findings
☆46May 23, 2023Updated 3 years ago