kuleshov-group / bd3lmsLinks

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

☆749

Alternatives and similar repositories for bd3lms

Users that are interested in bd3lms are comparing it to the libraries listed below

Sorting:

DreamLM / Dream
Dream 7B, a large diffusion language model
☆873Updated last month
Gen-Verse / MMaDA
MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
☆1,275Updated last month
Haiyang-W / TokenFormer
[ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
☆567Updated 5 months ago
dllm-reasoning / d1
Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"
☆255Updated last month
HKUNLP / DiffuLLaMA
[ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models
☆253Updated 2 months ago
ML-GSAI / SMDM
Official PyTorch implementation for ICLR2025 paper "Scaling up Masked Diffusion Models on Text"
☆267Updated 7 months ago
buoyancy99 / diffusion-forcing
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
☆948Updated 4 months ago
yifan123 / flow_grpo
An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
☆1,002Updated last week
TencentARC / SEED-Voken
SEED-Voken: A Series of Powerful Visual Tokenizers
☆922Updated last month
kuleshov-group / mdlm
[NeurIPS 2024] Simple and Effective Masked Diffusion Language Model
☆466Updated 2 months ago
louaaron / Score-Entropy-Discrete-Diffusion
[ICML 2024 Best Paper] Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (https://arxiv.org/abs/2310.16834)
☆612Updated last year
feizc / DiT-MoE
Scaling Diffusion Transformers with Mixture of Experts
☆356Updated 10 months ago
bytedance / 1d-tokenizer
This repo contains the code for 1D tokenizer and generator
☆975Updated 4 months ago
mit-han-lab / hart
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
☆621Updated 9 months ago
lucidrains / native-sparse-attention-pytorch
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
☆700Updated last month
lucidrains / transfusion-pytorch
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
☆1,184Updated last month
NVlabs / Long-RL
Long-RL: Scaling RL to Long Sequences
☆568Updated this week
sihyun-yu / REPA
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
☆1,251Updated 4 months ago
lucidrains / mmdit
Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch
☆401Updated 6 months ago
goombalab / hnet
H-Net: Hierarchical Network with Dynamic Chunking
☆632Updated last week
SandAI-org / MagiAttention
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
☆456Updated this week
apple / ml-flextok
FlexTok: Resampling Images into 1D Token Sequences of Flexible Length
☆234Updated 2 months ago
mit-han-lab / vila-u
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
☆374Updated 3 months ago
NVlabs / DoRA
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
☆820Updated 10 months ago
lucidrains / autoregressive-diffusion-pytorch
Implementation of Autoregressive Diffusion in Pytorch
☆399Updated 9 months ago
NVlabs / Fast-dLLM
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
☆320Updated this week
LMM101 / Awesome-Multimodal-Next-Token-Prediction
[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
☆446Updated 6 months ago
KellerJordan / Muon
Muon is an optimizer for hidden layers in neural networks
☆1,454Updated 3 weeks ago
raymin0223 / mixture_of_recursions
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
☆367Updated this week
MoonshotAI / Moonlight
Muon is Scalable for LLM Training
☆1,240Updated this week