microsoft / ReinMaxLinks
Beyond Straight-Through
☆97Updated 2 years ago
Alternatives and similar repositories for ReinMax
Users that are interested in ReinMax are comparing it to the libraries listed below
Sorting:
- ☆99Updated 2 years ago
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆108Updated last month
- [ICML 2023] Reflected Diffusion Models (https://arxiv.org/abs/2304.04740)☆157Updated last year
- ☆63Updated 3 years ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆65Updated last year
- Implementation of Discrete Key / Value Bottleneck, in Pytorch☆88Updated last year
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆124Updated last year
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆34Updated last year
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆66Updated 9 months ago
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆100Updated 2 years ago
- ☆48Updated last year
- NF-Layers for constructing neural functionals.☆85Updated last year
- ☆53Updated last year
- Explorations into the recently proposed Taylor Series Linear Attention☆99Updated 10 months ago
- [ICLR 2025] Code for the paper "Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning"☆64Updated 4 months ago
- [ICML 2022] Latent Diffusion Energy-Based Model for Interpretable Text Modeling☆65Updated 3 years ago
- ☆130Updated last year
- ICML 2022: Learning Iterative Reasoning through Energy Minimization☆46Updated 2 years ago
- ☆85Updated last year
- Stick-breaking attention☆57Updated last week
- ☆53Updated 8 months ago
- Transformers with doubly stochastic attention☆46Updated 2 years ago
- Official Jax Implementation of MD4 Masked Diffusion Models☆106Updated 3 months ago
- Code for paper "Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions"☆87Updated 4 years ago
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto☆56Updated last year
- Accelerated First Order Parallel Associative Scan☆182Updated 10 months ago
- Code for GFlowNet-EM, a novel algorithm for fitting latent variable models with compositional latents and an intractable true posterior.☆40Updated last year
- Code for the paper: Why Transformers Need Adam: A Hessian Perspective☆59Updated 3 months ago
- ☆37Updated last year
- ☆32Updated last year