lucidrains / discrete-key-value-bottleneck-pytorch
Implementation of Discrete Key / Value Bottleneck, in Pytorch
☆87Updated last year
Alternatives and similar repositories for discrete-key-value-bottleneck-pytorch:
Users that are interested in discrete-key-value-bottleneck-pytorch are comparing it to the libraries listed below
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆99Updated 2 years ago
- Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI☆87Updated 3 years ago
- Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process wi…☆50Updated 2 years ago
- JAX implementation ViT-VQGAN☆83Updated 2 years ago
- Standalone Product Key Memory module in Pytorch - for augmenting Transformer models☆78Updated 9 months ago
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆123Updated last year
- ☆51Updated last year
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆118Updated 6 months ago
- Language Quantized AutoEncoders☆104Updated 2 years ago
- [ICML 2023] Reflected Diffusion Models (https://arxiv.org/abs/2304.04740)☆157Updated last year
- ☆51Updated 10 months ago
- Implementation of Zorro, Masked Multimodal Transformer, in Pytorch☆97Updated last year
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto☆55Updated 11 months ago
- A Domain-Agnostic Benchmark for Self-Supervised Learning☆107Updated last year
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"☆57Updated last year
- Implementation of some personal helper functions for Einops, my most favorite tensor manipulation library ❤️☆54Updated 2 years ago
- ☆29Updated 2 years ago
- Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk☆46Updated last year
- Another attempt at a long-context / efficient transformer by me☆37Updated 3 years ago
- Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE …☆116Updated last year
- Procedural Image Programs for Representation Learning - NeurIPS 2022☆33Updated 7 months ago
- PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)☆33Updated 2 years ago
- [NeurIPS 2021] Code for Unsupervised Learning of Compositional Energy Concepts☆61Updated 2 years ago
- The Official PyTorch Implementation of "VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models" (ICLR 2021 spotlight…☆56Updated 2 years ago
- Experiment with diffusion models that you can run on your local jupyter instances☆59Updated 6 months ago
- Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper☆80Updated 3 years ago
- Official PyTorch Implementation of "Rosetta Neurons: Mining the Common Units in a Model Zoo"☆30Updated last year
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆203Updated last year
- ☆33Updated 7 months ago
- Un-*** 50 billions multimodality dataset☆24Updated 2 years ago