wiedersehne / Paramixer
Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention (CVPR 2022)
☆20Updated 2 years ago
Alternatives and similar repositories for Paramixer:
Users that are interested in Paramixer are comparing it to the libraries listed below
- Implementation of Kronecker Attention in Pytorch☆18Updated 4 years ago
- Implementation of LogAvgExp for Pytorch☆32Updated 2 years ago
- ☆40Updated last year
- ☆11Updated 2 years ago
- Repository for the PopulAtion Parameter Averaging (PAPA) paper☆26Updated 9 months ago
- Official repository for our ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology☆35Updated 3 years ago
- Official code for NeurIPS paper "Combinatorial Optimization for Panoptic Segmentation: A Fully Differentiable Approach".☆16Updated 2 years ago
- We investigated corruption robustness across different architectures including Convolutional Neural Networks, Vision Transformers, and th…☆15Updated 3 years ago
- Implementation of the Remixer Block from the Remixer paper, in Pytorch☆35Updated 3 years ago
- ☆21Updated last year
- Visual Representation Learning Benchmark for Self-Supervised Models☆35Updated 9 months ago
- Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process wi…☆50Updated 2 years ago
- ImageNet-12k subset of ImageNet-21k (fall11)☆21Updated last year
- ☆19Updated 3 years ago
- Code base for SRSGD.☆28Updated 4 years ago
- A project to improve out-of-distribution detection (open set recognition) and uncertainty estimation by changing a few lines of code in y…☆45Updated 2 years ago
- ☆41Updated 3 years ago
- Simple notebooks to learn diffusion models on toy datasets☆17Updated last year
- Successfully training approximations to full-rank matrices for efficiency in deep learning.☆16Updated 4 years ago
- Official Pytorch implementation of the paper: "Locally Shifted Attention With Early Global Integration"☆15Updated 3 years ago
- ☆25Updated 3 years ago
- ☆19Updated last year
- Code for the ICML 2021 and ICLR 2022 papers: Skew Orthogonal Convolutions, Improved deterministic l2 robustness on CIFAR-10 and CIFAR-100☆18Updated 2 years ago
- Nonparametric Score Estimators, ICML 2020☆36Updated 3 years ago
- Robust Optimal Transport code☆40Updated 2 years ago
- Implementation for <Orthogonal Over-Parameterized Training> in CVPR'21.☆19Updated 3 years ago
- Official code for `Visual Attention Emerges from Recurrent Sparse Reconstruction' (ICML 2022)☆35Updated 2 years ago
- PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations☆16Updated 4 years ago
- Efficient Neural Network Loss Landscape Generation☆10Updated 5 years ago