JinjieNi / dlms-are-super-data-learnersView external linksLinks
The official github repo for "Diffusion Language Models are Super Data Learners".
☆221Nov 6, 2025Updated 3 months ago
Alternatives and similar repositories for dlms-are-super-data-learners
Users that are interested in dlms-are-super-data-learners are comparing it to the libraries listed below
Sorting:
- GPU-optimized framework for training diffusion language models at any scale. The backend of Quokka, Super Data Learners, and OpenMoE 2 tr…☆322Nov 11, 2025Updated 3 months ago
- Measuring the Signal to Noise Ratio in Language Model Evaluation☆28Aug 19, 2025Updated 5 months ago
- Weird autoencoder experiments☆24Jan 26, 2026Updated 3 weeks ago
- The official github repo for "Training Optimal Large Diffusion Language Models", the first-ever large-scale diffusion language models sca…☆45Nov 6, 2025Updated 3 months ago
- 🔥 A minimal training framework for scaling FLA models☆349Nov 15, 2025Updated 3 months ago
- [Preprint] Efficient Generative Model Training via Embedded Representation Warmup☆36Oct 15, 2025Updated 4 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆42Dec 29, 2025Updated last month
- ☆30Oct 7, 2024Updated last year
- [ICLR 2026] TraceRL & TraDo-8B: Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models☆427Jan 28, 2026Updated 3 weeks ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆89Oct 30, 2024Updated last year
- [ICML 2025] Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction☆85May 26, 2025Updated 8 months ago
- The evaluation framework for training-free sparse attention in LLMs☆119Jan 27, 2026Updated 3 weeks ago
- FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones☆60Jan 26, 2026Updated 3 weeks ago
- [ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆182Sep 26, 2025Updated 4 months ago
- MMaDA - Open-Sourced Multimodal Large Diffusion Language Models☆1,574Nov 16, 2025Updated 3 months ago
- The official repo for "OpenMoE 2: Sparse Diffusion Language Models".☆52Dec 28, 2025Updated last month
- Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture. Training an MDM using GPT with this repo!☆34Jun 23, 2025Updated 7 months ago
- [NeurIPS 2024] Simple and Effective Masked Diffusion Language Model☆626Sep 29, 2025Updated 4 months ago
- 🚀 Efficient implementations of state-of-the-art linear attention models☆4,379Updated this week
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆469May 17, 2025Updated 9 months ago
- Official PyTorch implementation for "Large Language Diffusion Models"☆3,569Nov 12, 2025Updated 3 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆92Jul 17, 2025Updated 7 months ago
- Transition Models☆144Oct 7, 2025Updated 4 months ago
- Exploration of automated dataset selection approaches at large scales.☆52Mar 4, 2025Updated 11 months ago
- A merged read deduplication tool capable to perform merged read deduplication on single end data.☆12Sep 4, 2024Updated last year
- ICCV'23 | Adverse Weather Removal with Codebook Priors☆10Aug 28, 2023Updated 2 years ago
- SDAR (Synergy of Diffusion and AutoRegression), a large diffusion language model(1.7B, 4B, 8B, 30B)☆332Dec 15, 2025Updated 2 months ago
- ☆12Sep 4, 2023Updated 2 years ago
- ☆10Nov 28, 2023Updated 2 years ago
- A single-line modification to any (dualizer-based) optimizer that allows the optimizer to adapt to the scale of the gradients as they cha…☆19Jan 11, 2025Updated last year
- Don't just regulate gradients like in Muon, regulate the weights too☆31Jul 30, 2025Updated 6 months ago
- [ICML 2025] Diff-MoE: Diffusion Transformer with Time-Aware and Space-Adaptive Experts☆24Nov 10, 2025Updated 3 months ago
- Exploring how ChatGPT can be used to accelerate research in cosmology.☆12Dec 12, 2022Updated 3 years ago
- ☆29Updated this week
- [NeurIPS 2025] Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking☆22Oct 22, 2025Updated 3 months ago
- Few-shot Learning with Auxiliary Data☆31Dec 8, 2023Updated 2 years ago
- A framework that allows you to apply Sparse AutoEncoder on any models☆51Jul 11, 2025Updated 7 months ago
- Dream 7B, a large diffusion language model☆1,167Nov 21, 2025Updated 2 months ago
- ☆33Jan 6, 2025Updated last year