tonyduan / transformer-blocksLinks
Multi-Head Attention, Transformer, Perceiver, Linear Attention.
☆11Updated last year
Alternatives and similar repositories for transformer-blocks
Users that are interested in transformer-blocks are comparing it to the libraries listed below
Sorting:
- A differentiation API for PyTorch☆30Updated 5 years ago
- ☆15Updated 4 years ago
- An example showing how to use jax to train resnet50 on multi-node multi-GPU☆20Updated 2 years ago
- Code for "'Hey, that's not an ODE:' Faster ODE Adjoints via Seminorms" (ICML 2021)☆87Updated 2 years ago
- ☆20Updated 3 years ago
- Code for 'Periodic Activation Functions Induce Stationarity' (NeurIPS 2021)☆19Updated 3 years ago
- Relative gradient optimization of the Jacobian term in unsupervised deep learning, NeurIPS 2020☆21Updated 4 years ago
- Convex potential flows☆83Updated 3 years ago
- Efficient Householder Transformation in PyTorch☆66Updated 3 years ago
- Monotone operator equilibrium networks☆52Updated 5 years ago
- A PyTorch implementation of Conditional PixelCNNs☆27Updated 7 years ago
- Pytorch implementation of 'Semi-Implicit Methods for Deep Neural Networks'☆25Updated 6 years ago
- Code for the article "What if Neural Networks had SVDs?", to be presented as a spotlight paper at NeurIPS 2020.☆75Updated 11 months ago
- ☆22Updated 5 years ago
- MintNet: Building Invertible Neural Networks with Masked Convolutions☆39Updated 4 years ago
- Code required to reproduce the experiments in Auxiliary Variational MCMC☆17Updated 6 years ago
- ☆47Updated last year
- Pytorch implementation for "Particle Flow Bayes' Rule"☆14Updated 6 years ago
- Exemplar VAE: Linking Generative Models, Nearest Neighbor Retrieval, and Data Augmentation☆69Updated 4 years ago
- Implementation of Lie Transformer, Equivariant Self-Attention, in Pytorch☆92Updated 4 years ago
- PyTorch implementation of Continuously Indexed Flows paper, with many baseline normalising flows☆31Updated 3 years ago
- Riemannian Convex Potential Maps☆67Updated 2 years ago
- This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent…☆72Updated last year
- A public repository for our paper, Rao-Blackwellized Stochastic Gradients for Discrete Distributions☆22Updated 6 years ago
- Supplementary code for the paper "Meta-Solver for Neural Ordinary Differential Equations" https://arxiv.org/abs/2103.08561☆25Updated 4 years ago
- Jupyter Notebook corresponding to 'Going with the Flow: An Introduction to Normalizing Flows'☆26Updated 4 years ago
- Code associated with our paper "Learning Group Structure and Disentangled Representations of Dynamical Environments"☆15Updated 2 years ago
- Hierarchical variational models for physics.☆17Updated 5 years ago
- Continuous-time gradient flow for generative modeling and variational inference☆32Updated 6 years ago
- Lie Algebra Convolutional Network implementation☆43Updated 3 years ago