lucidrains / discrete-key-value-bottleneck-pytorch
Implementation of Discrete Key / Value Bottleneck, in Pytorch
☆87Updated last year
Alternatives and similar repositories for discrete-key-value-bottleneck-pytorch:
Users that are interested in discrete-key-value-bottleneck-pytorch are comparing it to the libraries listed below
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆99Updated 2 years ago
- Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI☆86Updated 3 years ago
- JAX implementation ViT-VQGAN☆82Updated 2 years ago
- Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process wi…☆50Updated 2 years ago
- ☆51Updated 9 months ago
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆204Updated last year
- Experiment with diffusion models that you can run on your local jupyter instances☆57Updated 4 months ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆116Updated 5 months ago
- [ICML 2023] Reflected Diffusion Models (https://arxiv.org/abs/2304.04740)☆158Updated last year
- Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk☆46Updated last year
- Code for ICLR 2023 Paper, "Stable Target Field for Reduced Variance Score Estimation in Diffusion Models”☆72Updated last year
- ☆51Updated last year
- Implementation of RQ Transformer, proposed in the paper "Autoregressive Image Generation using Residual Quantization"☆102Updated 2 years ago
- Code for "Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?" [ICML 2023]☆31Updated 6 months ago
- Beyond Straight-Through☆94Updated last year
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆44Updated last year
- ☆29Updated 2 years ago
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"☆57Updated last year
- Standalone Product Key Memory module in Pytorch - for augmenting Transformer models☆78Updated 7 months ago
- Visualizing representations with diffusion based conditional generative model.☆90Updated last year
- Language Quantized AutoEncoders☆101Updated 2 years ago
- Sequence Modeling with Structured State Spaces☆63Updated 2 years ago
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆122Updated last year
- Implementation of LogAvgExp for Pytorch☆34Updated 2 years ago
- ☆36Updated last year
- Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE …☆116Updated last year
- [NeurIPS'20] Code for the Paper Compositional Visual Generation and Inference with Energy Based Models☆44Updated last year
- ☆73Updated 2 years ago
- Implementation of some personal helper functions for Einops, my most favorite tensor manipulation library ❤️☆54Updated 2 years ago
- ☆49Updated 4 years ago