peerdavid / layerwise-batch-entropyLinks

Layerwise Batch Entropy Regularization

☆23

Alternatives and similar repositories for layerwise-batch-entropy

Users that are interested in layerwise-batch-entropy are comparing it to the libraries listed below

Sorting:

jiaweizzhao / ZerO-initialization
☆75Updated 2 years ago
dwromero / ckconv
Code repository of the paper "CKConv: Continuous Kernel Convolution For Sequential Data" published at ICLR 2022. https://arxiv.org/abs/21…
☆123Updated 2 years ago
thjashin / multires-conv
Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)
☆127Updated 2 years ago
tk-rusch / LEM
Official code for Long Expressive Memory (ICLR 2022, Spotlight)
☆71Updated 3 years ago
lucidrains / ponder-transformer
Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper
☆81Updated 3 years ago
ColinQiyangLi / AdaCat
AdaCat
☆49Updated 3 years ago
gisilvs / AEF
☆33Updated 2 years ago
lucidrains / isab-pytorch
An implementation of (Induced) Set Attention Block, from the Set Transformers paper
☆64Updated 2 years ago
ctlllll / SGConv
☆164Updated 2 years ago
TomFrederik / grokking
Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'
☆38Updated 3 years ago
facebookresearch / semi-discrete-flow
code for "Semi-Discrete Normalizing Flows through Differentiable Tessellation"
☆27Updated 2 years ago
lucidrains / mlp-gpt-jax
A GPT, made only of MLPs, in Jax
☆58Updated 4 years ago
lucidrains / gated-state-spaces-pytorch
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆101Updated 2 years ago
ag1988 / dss
Sequence Modeling with Structured State Spaces
☆66Updated 3 years ago
liu-ziyin / NeurIPS_2020_Snake
☆31Updated 4 years ago
lucidrains / compositional-attention-pytorch
Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process wi…
☆51Updated 3 years ago
teddykoker / learning-to-learn-jax
JAX implementation of Learning to learn by gradient descent by gradient descent
☆28Updated 2 months ago
Newbeeer / Anytime-Auto-Regressive-Model
Code for ICLR 2021 Paper, "Anytime Sampling for Autoregressive Models via Ordered Autoencoding"
☆26Updated 2 years ago
ischlag / fast-weight-transformers
Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.
☆105Updated 4 years ago
Lightning-Universe / paper-AAVAE
☆47Updated 2 years ago
google-research / diffstride
TF/Keras code for DiffStride, a pooling layer with learnable strides.
☆124Updated 3 years ago
michaelsdr / sinkformers
Transformers with doubly stochastic attention
☆49Updated 3 years ago
jmtomczak / git_flow
General Invertible Transformations for Flow-based Generative Models
☆18Updated 4 years ago
lucidrains / g-mlp-gpt
GPT, but made only out of MLPs
☆89Updated 4 years ago
lucidrains / ESBN-pytorch
Usable implementation of Emerging Symbol Binding Network (ESBN), in Pytorch
☆25Updated 4 years ago
rasbt / cyclemoid-pytorch
Cyclemoid implementation for PyTorch
☆90Updated 3 years ago
optimizedlearning / mechanic
☆36Updated last year
RobertCsordas / transformer_generalization
The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We s…
☆67Updated 2 years ago
hlml / fortuitous_forgetting
☆19Updated 3 years ago
lucidrains / local-attention-flax
Local Attention - Flax module for Jax
☆22Updated 4 years ago