wiedersehne / Paramixer
Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention (CVPR 2022)
☆20Updated 2 years ago
Alternatives and similar repositories for Paramixer:
Users that are interested in Paramixer are comparing it to the libraries listed below
- Implementation of Kronecker Attention in Pytorch☆18Updated 4 years ago
- Implementation of the Remixer Block from the Remixer paper, in Pytorch☆35Updated 3 years ago
- Code base for SRSGD.☆28Updated 4 years ago
- Official code for NeurIPS paper "Combinatorial Optimization for Panoptic Segmentation: A Fully Differentiable Approach".☆16Updated 2 years ago
- Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process wi…☆50Updated 2 years ago
- ☆41Updated 3 years ago
- PyTorch implementation of HashedNets☆36Updated last year
- Architecture embeddings independent from the parametrization of the search space☆15Updated 3 years ago
- PyTorch implementation of IRMAE https//arxiv.org/abs/2010.00679☆45Updated 2 years ago
- ☆21Updated 2 years ago
- Successfully training approximations to full-rank matrices for efficiency in deep learning.☆16Updated 4 years ago
- Simple notebooks to learn diffusion models on toy datasets☆17Updated 2 years ago
- ImageNet-12k subset of ImageNet-21k (fall11)☆21Updated last year
- We investigated corruption robustness across different architectures including Convolutional Neural Networks, Vision Transformers, and th…☆15Updated 3 years ago
- Piecewise Linear Functions (PWL) implementation in PyTorch☆51Updated 2 years ago
- ☆41Updated last year
- ☆36Updated 3 years ago
- ☆25Updated 4 years ago
- Meta Optimal Transport☆98Updated last year
- Convolutions and more as einsum for PyTorch☆14Updated 8 months ago
- ☆20Updated last year
- Self-Distillation with weighted ground-truth targets; ResNet and Kernel Ridge Regression☆17Updated 3 years ago
- Official implementation of "UNAS: Differentiable Architecture Search Meets Reinforcement Learning", CVPR 2020 Oral☆60Updated last year
- Implementation of Spectral Leakage and Rethinking the Kernel Size in CNNs in Pytorch☆14Updated 4 years ago
- Cyclic Differentiable Architecture Search☆36Updated 3 years ago
- Efficient Neural Network Loss Landscape Generation☆10Updated 5 years ago
- A collection of optimizers, some arcane others well known, for Flax.☆29Updated 3 years ago
- Implementation for ACProp ( Momentum centering and asynchronous update for adaptive gradient methdos, NeurIPS 2021)☆15Updated 3 years ago
- Official implementation for "Minimax Active Learning" in PyTorch.☆9Updated 4 years ago
- ☆19Updated 3 years ago