wiedersehne / Paramixer
Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention (CVPR 2022)
☆19Updated last year
Related projects: ⓘ
- Repo reproducing experimental results in "Addressing the Topological Defects of Disentanglement"☆23Updated 2 years ago
- Code base for SRSGD.☆28Updated 4 years ago
- [ICML 2024] SINGD: KFAC-like Structured Inverse-Free Natural Gradient Descent (http://arxiv.org/abs/2312.05705)☆19Updated 2 months ago
- PyTorch implementation of HashedNets☆35Updated last year
- Implementation of the Remixer Block from the Remixer paper, in Pytorch☆36Updated 2 years ago
- A GPT, made only of MLPs, in Jax☆55Updated 3 years ago
- Official repository for our ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology☆36Updated 3 years ago
- Implementation of LogAvgExp for Pytorch☆32Updated 2 years ago
- Piecewise Linear Functions (PWL) implementation in PyTorch☆47Updated 2 years ago
- ☆35Updated 2 years ago
- Official repository for the paper "Going Beyond Linear Transformers with Recurrent Fast Weight Programmers" (NeurIPS 2021)☆47Updated last year
- Identify a binary weight or binary weight and activation subnetwork within a randomly initialized network by only pruning and binarizing …☆48Updated 2 years ago
- A collection of optimizers, some arcane others well known, for Flax.☆29Updated 3 years ago
- code for "Semi-Discrete Normalizing Flows through Differentiable Tessellation"☆24Updated last year
- ☆46Updated 5 years ago
- Code for the article "What if Neural Networks had SVDs?", to be presented as a spotlight paper at NeurIPS 2020.☆68Updated last month
- 👑 Pytorch code for the Nero optimiser.☆20Updated last year
- ☆18Updated 3 years ago
- Official implementation of the paper "Topographic VAEs learn Equivariant Capsules"☆77Updated 2 years ago
- Successfully training approximations to full-rank matrices for efficiency in deep learning.☆16Updated 3 years ago
- ☆29Updated last year
- Spectral Tensor Train Parameterization of Deep Learning Layers☆13Updated 3 years ago
- Implementation of deep implicit attention in PyTorch☆63Updated 3 years ago
- Layerwise Batch Entropy Regularization☆22Updated 2 years ago
- Reproducible code for Augmentation paper☆18Updated 5 years ago
- A pytorch implementation for the LSTM experiments in the paper: Why Gradient Clipping Accelerates Training: A Theoretical Justification f…☆44Updated 4 years ago
- Implementation of Kronecker Attention in Pytorch☆17Updated 4 years ago
- Efficient Householder Transformation in PyTorch☆58Updated 3 years ago
- ☆22Updated 6 years ago
- Efficient Riemannian Optimization on Stiefel Manifold via Cayley Transform☆34Updated 5 years ago