microsoft / ReinMax
Beyond Straight-Through
☆90Updated last year
Related projects ⓘ
Alternatives and complementary repositories for ReinMax
- A curated list for awesome discrete diffusion models resources.☆61Updated this week
- ☆75Updated last year
- [ICML 2023] Reflected Diffusion Models (https://arxiv.org/abs/2304.04740)☆157Updated last year
- Reparameterized Discrete Diffusion Models for Text Generation☆90Updated last year
- ☆113Updated 8 months ago
- Transformers with doubly stochastic attention☆40Updated 2 years ago
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆94Updated last year
- Understanding the Difficulty of Training Transformers☆45Updated 2 years ago
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆33Updated last year
- ICML 2022: Learning Iterative Reasoning through Energy Minimization☆43Updated last year
- ☆46Updated last month
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆49Updated last month
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆120Updated last year
- Official code for "Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving", ICML 2021☆26Updated 3 years ago
- ☆65Updated 7 months ago
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆108Updated 7 months ago
- Official code for "Maximum Likelihood Training for Score-Based Diffusion ODEs by High-Order Denoising Score Matching" (ICML 2022)☆53Updated 2 years ago
- Code for the paper https://arxiv.org/abs/2205.14987v2☆43Updated 6 months ago
- ☆32Updated 9 months ago
- Code for paper "Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions"☆83Updated 3 years ago
- ☆58Updated 2 years ago
- Sequence Modeling with Structured State Spaces☆60Updated 2 years ago
- ☆50Updated 4 months ago
- Implementation of Discrete Key / Value Bottleneck, in Pytorch☆87Updated last year
- ☆28Updated 7 months ago
- Code for the paper: Why Transformers Need Adam: A Hessian Perspective☆40Updated 6 months ago
- Standalone Product Key Memory module in Pytorch - for augmenting Transformer models☆72Updated 3 months ago
- NF-Layers for constructing neural functionals.☆75Updated 10 months ago
- Sparse Backpropagation for Mixture-of-Expert Training☆22Updated 4 months ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆43Updated last year