microsoft / ReinMaxLinks

Beyond Straight-Through

☆100

Alternatives and similar repositories for ReinMax

Users that are interested in ReinMax are comparing it to the libraries listed below

Sorting:

lucidrains / gated-state-spaces-pytorch
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆101Updated 2 years ago
zhixuan-lin / forgetting-transformer
[ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"
☆118Updated last month
igul222 / plaid
☆104Updated 2 years ago
lucidrains / discrete-key-value-bottleneck-pytorch
Implementation of Discrete Key / Value Bottleneck, in Pytorch
☆88Updated 2 years ago
louaaron / Reflected-Diffusion
[ICML 2023] Reflected Diffusion Models (https://arxiv.org/abs/2304.04740)
☆155Updated last year
tml-epfl / why-weight-decay
Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]
☆66Updated 10 months ago
lsj2408 / URPE
[NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)
☆34Updated 2 years ago
thjashin / multires-conv
Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)
☆125Updated last year
OpenNLPLab / Tnn
[ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling
☆79Updated last year
sustcsonglin / mamba-triton
☆49Updated last year
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆100Updated 11 months ago
gregorbachmann / scaling_mlps
☆51Updated last year
microsoft / SparseMixer
Sparse Backpropagation for Mixture-of-Expert Training
☆30Updated last year
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆40Updated last year
JeanKaddour / NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆80Updated last year
lucidrains / self-reasoning-tokens-pytorch
Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto
☆56Updated last year
minyoungg / vqtorch
☆136Updated last year
HKUNLP / diffusion-vs-ar
[ICLR 2025] Code for the paper "Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning"
☆69Updated 5 months ago
lucidrains / hourglass-transformer-pytorch
Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI
☆91Updated 3 years ago
gregorbachmann / Next-Token-Failures
☆89Updated last year
ermongroup / fast_feedforward_computation
Official code for "Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving", ICML 2021
☆27Updated 3 years ago
sustcsonglin / gated_linear_attention_layer
☆32Updated last year
OpenNLPLab / HGRN
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆66Updated last year
shawntan / stickbreaking-attention
Stick-breaking attention
☆59Updated last month
AllanYangZhou / nfn
NF-Layers for constructing neural functionals.
☆88Updated last year
JonasGeiping / linear_cross_entropy_loss
A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.
☆70Updated last year
McGill-NLP / length-generalization
Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023
☆136Updated last year
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆62Updated last year
google-deepmind / emergent_in_context_learning
☆84Updated last year
yuPeiyu98 / Latent-Diffusion-EBM
[ICML 2022] Latent Diffusion Energy-Based Model for Interpretable Text Modeling
☆66Updated 3 years ago