maum-ai / pnlp-mixerLinks

Unofficial PyTorch Implementation for pNLP-Mixer: an Efficient all-MLP Architecture for Language (https://arxiv.org/abs/2202.04350)

☆64

Alternatives and similar repositories for pnlp-mixer

Users that are interested in pnlp-mixer are comparing it to the libraries listed below

Sorting:

lucidrains / RQ-Transformer
Implementation of RQ Transformer, proposed in the paper "Autoregressive Image Generation using Residual Quantization"
☆121Updated 3 years ago
lucidrains / fast-transformer-pytorch
Implementation of Fast Transformer in Pytorch
☆177Updated 4 years ago
lucidrains / Mega-pytorch
Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
☆206Updated 2 years ago
microsoft / ResiDual
ResiDual: Transformer with Dual Residual Connections, https://arxiv.org/abs/2304.14802
☆96Updated 2 years ago
lucidrains / gated-state-spaces-pytorch
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆101Updated 2 years ago
facebookresearch / mega
Sequence modeling with Mega.
☆300Updated 2 years ago
lucidrains / CoLT5-attention
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
☆230Updated last year
lucidrains / mixture-of-attention
Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
☆119Updated last year
lucidrains / insertion-deletion-ddpm
Implementation of Insertion-deletion Denoising Diffusion Probabilistic Models
☆30Updated 3 years ago
erksch / fnet-pytorch
Unofficial PyTorch implementation of Google's FNet: Mixing Tokens with Fourier Transforms. With checkpoints.
☆77Updated 3 years ago
lucidrains / long-short-transformer
Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch
☆120Updated 4 years ago
lucidrains / flash-cosine-sim-attention
Implementation of fused cosine similarity attention in the same style as Flash Attention
☆217Updated 2 years ago
lucidrains / memory-compressed-attention
Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"
☆69Updated 2 years ago
lucidrains / axial-positional-embedding
Axial Positional Embedding for Pytorch
☆83Updated 7 months ago
lucidrains / agent-attention-pytorch
Implementation of Agent Attention in Pytorch
☆91Updated last year
lucidrains / memory-transformer-xl
A variant of Transformer-XL where the memory is updated not with a queue, but with attention
☆49Updated 5 years ago
amazon-science / masked-diffusion-lm
Official implementation for the paper "A Cheaper and Better Diffusion Language Model with Soft-Masked Noise"
☆59Updated 2 years ago
lucidrains / n-grammer-pytorch
Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch
☆76Updated 2 years ago
lucidrains / NWT-pytorch
Implementation of NWT, audio-to-video generation, in Pytorch
☆92Updated 3 years ago
AminRezaei0x443 / memory-efficient-attention
Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch
☆182Updated 2 years ago
lucidrains / adam-atan2-pytorch
Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch
☆129Updated this week
lucidrains / light-recurrent-unit-pytorch
Implementation of a Light Recurrent Unit in Pytorch
☆49Updated last year
aliutkus / spe
Relative Positional Encoding for Transformers with Linear Complexity
☆65Updated 3 years ago
lucidrains / memformer
Implementation of Memformer, a Memory-augmented Transformer, in Pytorch
☆123Updated 4 years ago
lucidrains / rvq-vae-gpt
My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation
☆88Updated last year
OpenNLPLab / Tnn
[ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling
☆80Updated last year
lucidrains / linformer
Implementation of Linformer for Pytorch
☆299Updated last year
sooftware / luna-transformer
A PyTorch Implementation of the Luna: Linear Unified Nested Attention
☆41Updated 4 years ago
jaketae / alibi
PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
☆30Updated 3 years ago
lucidrains / block-recurrent-transformer-pytorch
Implementation of Block Recurrent Transformer - Pytorch
☆221Updated last year