xl402 / performer
Tensorflow implementation of a linear attention architecture
☆44Updated 3 years ago
Alternatives and similar repositories for performer:
Users that are interested in performer are comparing it to the libraries listed below
- Implements MLP-Mixer (https://arxiv.org/abs/2105.01601) with the CIFAR-10 dataset.☆54Updated 2 years ago
- Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning☆160Updated last year
- Simple stochastic weight averaging callback for Keras☆63Updated 3 years ago
- Implementation of Feedback Transformer in Pytorch☆105Updated 3 years ago
- ☆213Updated 4 years ago
- Efficient Transformers for research, PyTorch and Tensorflow using Locality Sensitive Hashing☆93Updated 5 years ago
- Unofficial PyTorch implementation of Fastformer based on paper "Fastformer: Additive Attention Can Be All You Need"."☆134Updated 3 years ago
- Code for scaling Transformers☆26Updated 4 years ago
- Fourth place solution to the "OpenVaccine: COVID-19 mRNA Vaccine Degradation Prediction" organized by Stanford University and Kaggle☆20Updated 4 years ago
- Implementation of Fast Transformer in Pytorch☆172Updated 3 years ago
- Simply Numpy implementation of the FAVOR+ attention mechanism, https://teddykoker.com/2020/11/performers/☆37Updated 4 years ago
- Implements sharpness-aware minimization (https://arxiv.org/abs/2010.01412) in TensorFlow 2.☆60Updated 3 years ago
- Layerwise Batch Entropy Regularization☆22Updated 2 years ago
- Cyclemoid implementation for PyTorch☆87Updated 2 years ago
- Implementation of modern data augmentation techniques in TensorFlow 2.x to be used in your training pipeline.☆34Updated 4 years ago
- Accelerated NLP pipelines for fast inference on CPU. Built with Transformers and ONNX runtime.☆126Updated 4 years ago
- Implementation of ETSformer, state of the art time-series Transformer, in Pytorch☆152Updated last year
- Simple gradient checkpointing for eager mode execution☆46Updated 4 years ago
- Implementation of self-supervised image-level contrastive pretraining methods using Keras.☆69Updated 3 years ago
- Relative Positional Encoding for Transformers with Linear Complexity☆62Updated 2 years ago
- Local Attention - Flax module for Jax☆20Updated 3 years ago
- Demonstrates knowledge distillation for image-based models in Keras.☆52Updated 3 years ago
- Axial Positional Embedding for Pytorch☆70Updated 2 weeks ago
- Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)☆60Updated 2 years ago
- ADAS is short for Adaptive Step Size, it's an optimizer that unlike other optimizers that just normalize the derivative, it fine-tunes th…☆85Updated 4 years ago
- Unofficial PyTorch implementation of Google's FNet: Mixing Tokens with Fourier Transforms. With checkpoints.☆72Updated 2 years ago
- HMMs in PyTorch☆135Updated 3 years ago
- Implementation of Nyström Self-attention, from the paper Nyströmformer☆127Updated last year
- A collection of Models, Datasets, DataModules, Callbacks, Metrics, Losses and Loggers to better integrate pytorch-lightning with transfor…☆47Updated last year
- Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).☆225Updated 2 years ago