dvruette / gidd
Code accompanying the paper "Generalized Interpolating Discrete Diffusion"
☆73Updated 3 weeks ago
Alternatives and similar repositories for gidd:
Users that are interested in gidd are comparing it to the libraries listed below
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆46Updated last month
- Focused on fast experimentation and simplicity☆71Updated 3 months ago
- ☆52Updated 3 weeks ago
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆90Updated last week
- A general framework for inference-time scaling and steering of diffusion models with arbitrary rewards.☆124Updated 2 months ago
- ☆67Updated last month
- Stick-breaking attention☆50Updated last month
- Official Jax Implementation of MD4 Masked Diffusion Models☆74Updated last month
- Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"☆39Updated this week
- Here we will test various linear attention designs.☆60Updated 11 months ago
- When it comes to optimizers, it's always better to be safe than sorry☆217Updated 2 weeks ago
- Normalized Transformer (nGPT)☆167Updated 4 months ago
- supporting pytorch FSDP for optimizers☆80Updated 4 months ago
- Remasking Discrete Diffusion Models with Inference-Time Scaling☆18Updated last month
- research impl of Native Sparse Attention (2502.11089)☆53Updated last month
- Official Code for Paper "Think While You Generate: Discrete Diffusion with Planned Denoising" [ICLR 2025]☆56Updated 3 weeks ago
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆152Updated last month
- ☆76Updated 9 months ago
- WIP☆93Updated 8 months ago
- ☆95Updated last year
- RWKV-7: Surpassing GPT☆83Updated 5 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆26Updated 7 months ago
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆148Updated 3 weeks ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆126Updated 4 months ago
- ☆77Updated 7 months ago
- FlexTok: Resampling Images into 1D Token Sequences of Flexible Length☆98Updated 2 weeks ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆71Updated 5 months ago
- ☆22Updated 9 months ago
- Official PyTorch Implementation for Paper "No More Adam: Learning Rate Scaling at Initialization is All You Need"☆51Updated 2 months ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆27Updated 2 months ago