igul222 / plaid
☆84Updated last year
Alternatives and similar repositories for plaid:
Users that are interested in plaid are comparing it to the libraries listed below
- Reparameterized Discrete Diffusion Models for Text Generation☆93Updated 2 years ago
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆88Updated 2 months ago
- DiffusER: Discrete Diffusion via Edit-based Reconstruction (Reid, Hellendoorn & Neubig, 2022)☆54Updated last year
- Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"☆65Updated last year
- ☆80Updated 11 months ago
- [ICLR 2025] Code for the paper "Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning"☆31Updated this week
- Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control☆66Updated 2 years ago
- Stick-breaking attention☆42Updated last month
- Official PyTorch implementation for ICLR2025 paper "Scaling up Masked Diffusion Models on Text"☆64Updated last month
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Updated last year
- ☆28Updated 3 months ago
- ☆51Updated 8 months ago
- ☆82Updated 4 months ago
- Simplified Masked Diffusion Language Model☆265Updated 2 months ago
- Language models scale reliably with over-training and on downstream tasks☆96Updated 10 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆25Updated 9 months ago
- Language Quantized AutoEncoders☆99Updated 2 years ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆60Updated 4 months ago
- ☆30Updated 11 months ago
- Online Adaptation of Language Models with a Memory of Amortized Contexts (NeurIPS 2024)☆61Updated 6 months ago
- Randomized Positional Encodings Boost Length Generalization of Transformers☆79Updated 11 months ago
- [NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"☆105Updated 11 months ago
- Sparse Backpropagation for Mixture-of-Expert Training☆28Updated 7 months ago
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆97Updated last year
- ☆105Updated 2 years ago
- ☆33Updated last year
- ☆121Updated 11 months ago
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆61Updated 6 months ago
- Directional Preference Alignment☆54Updated 4 months ago
- ☆86Updated this week