microsoft / ReinMax
Beyond Straight-Through
☆90Updated last year
Related projects ⓘ
Alternatives and complementary repositories for ReinMax
- ☆75Updated last year
- A curated list for awesome discrete diffusion models resources.☆67Updated last week
- ☆118Updated 8 months ago
- [ICML 2023] Reflected Diffusion Models (https://arxiv.org/abs/2304.04740)☆157Updated last year
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆95Updated last year
- ☆69Updated 8 months ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆52Updated last month
- Official code for "Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving", ICML 2021☆26Updated 3 years ago
- Sparse Backpropagation for Mixture-of-Expert Training☆24Updated 4 months ago
- Implementation of Discrete Key / Value Bottleneck, in Pytorch☆87Updated last year
- ☆28Updated 7 months ago
- ICML 2022: Learning Iterative Reasoning through Energy Minimization☆43Updated last year
- Reparameterized Discrete Diffusion Models for Text Generation☆90Updated last year
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆109Updated 8 months ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆43Updated last year
- Code for the paper: Why Transformers Need Adam: A Hessian Perspective☆42Updated 6 months ago
- Transformers with doubly stochastic attention☆40Updated 2 years ago
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆120Updated last year
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated last year
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆40Updated 3 months ago
- ☆51Updated 5 months ago
- Blog post☆16Updated 9 months ago
- Official code for "Maximum Likelihood Training for Score-Based Diffusion ODEs by High-Order Denoising Score Matching" (ICML 2022)☆53Updated 2 years ago
- [ICML 2022] Latent Diffusion Energy-Based Model for Interpretable Text Modeling☆63Updated 2 years ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆61Updated 6 months ago
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆33Updated last year
- Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk☆46Updated last year
- Sequence Modeling with Structured State Spaces☆60Updated 2 years ago
- Code for paper "Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions"☆83Updated 3 years ago
- ☆58Updated 2 years ago