hyeon95y / SparseLinear
A custom PyTorch layer that is capable of implementing extremely wide and sparse linear layers efficiently
☆49Updated last year
Alternatives and similar repositories for SparseLinear:
Users that are interested in SparseLinear are comparing it to the libraries listed below
- Structured matrices for compressing neural networks☆66Updated last year
- Tensorflow implementation and notebooks for Implicit Maximum Likelihood Estimation☆67Updated 2 years ago
- Easy-to-use AdaHessian optimizer (PyTorch)☆77Updated 4 years ago
- Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.☆102Updated 3 years ago
- Euclidean Wasserstein-2 optimal transportation☆44Updated last year
- A GPT, made only of MLPs, in Jax☆57Updated 3 years ago
- The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We s…☆66Updated 2 years ago
- ☆49Updated 4 years ago
- Fast Discounted Cumulative Sums in PyTorch☆95Updated 3 years ago
- Pytorch library for factorized L0-based pruning.☆45Updated last year
- Code accompanying our paper "Feature Learning in Infinite-Width Neural Networks" (https://arxiv.org/abs/2011.14522)☆59Updated 3 years ago
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆154Updated last month
- Transformers with doubly stochastic attention☆44Updated 2 years ago
- PyTorch implementation of HashedNets☆36Updated last year
- repo for paper: Adaptive Checkpoint Adjoint (ACA) method for gradient estimation in neural ODE☆54Updated 3 years ago
- Code for the paper: "Tensor Programs II: Neural Tangent Kernel for Any Architecture"☆103Updated 4 years ago
- Study on the applicability of Direct Feedback Alignment to neural view synthesis, recommender systems, geometric learning, and natural la…☆86Updated 2 years ago
- CIFAR-5m dataset☆38Updated 4 years ago
- Differentiable Algorithms and Algorithmic Supervision.☆112Updated last year
- Sequence Modeling with Structured State Spaces☆61Updated 2 years ago
- ICML 2020 Paper: Latent Variable Modelling with Hyperbolic Normalizing Flows☆53Updated 2 years ago
- Limitations of the Empirical Fisher Approximation☆47Updated 4 years ago
- Implementations and checkpoints for ResNet, Wide ResNet, ResNeXt, ResNet-D, and ResNeSt in JAX (Flax).☆106Updated 2 years ago
- Very deep VAEs in JAX/Flax☆46Updated 3 years ago
- Code base for SRSGD.☆28Updated 4 years ago
- Layerwise Batch Entropy Regularization☆22Updated 2 years ago
- Implementation of deep implicit attention in PyTorch☆64Updated 3 years ago
- Monotone operator equilibrium networks☆51Updated 4 years ago
- AdaCat☆49Updated 2 years ago
- Estimating Gradients for Discrete Random Variables by Sampling without Replacement☆39Updated 4 years ago