RadicalNumerics / RND1Links
RND1: Scaling Diffusion Language Models
☆172Updated last week
Alternatives and similar repositories for RND1
Users that are interested in RND1 are comparing it to the libraries listed below
Sorting:
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆56Updated 10 months ago
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆137Updated last month
- Code accompanying the paper "Generalized Interpolating Discrete Diffusion"☆112Updated 7 months ago
- DPO, but faster 🚀☆46Updated last year
- ☆66Updated 9 months ago
- Repository for "TESS-2: A Large-Scale, Generalist Diffusion Language Model"☆53Updated 11 months ago
- Esoteric Language Models☆108Updated last month
- research impl of Native Sparse Attention (2502.11089)☆63Updated 11 months ago
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.☆124Updated 2 months ago
- ☆91Updated last year
- Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible…☆95Updated this week
- Flash Attention Triton kernel with support for second-order derivatives☆129Updated 3 weeks ago
- Official implementation of GRAPE: Group Representational Position Encoding (https://arxiv.org/abs/2512.07805)☆73Updated 2 weeks ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆229Updated 7 months ago
- Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…☆46Updated 4 months ago
- Here we will test various linear attention designs.☆62Updated last year
- [ICLR 2025] Code for the paper "Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning"☆87Updated 11 months ago
- Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…☆118Updated last week
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.☆82Updated last month
- Official Jax Implementation of MD4 Masked Diffusion Models☆151Updated 10 months ago
- ☆111Updated 2 years ago
- ☆44Updated 2 months ago
- Defeating the Training-Inference Mismatch via FP16☆176Updated 2 months ago
- ☆112Updated last year
- Remasking Discrete Diffusion Models with Inference-Time Scaling☆65Updated 10 months ago
- ☆110Updated 4 months ago
- Easy and Efficient dLLM Fine-Tuning☆195Updated last month
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆139Updated last week
- ☆265Updated 7 months ago
- Multi-Turn RL Training System with AgentTrainer for Language Model Game Reinforcement Learning☆58Updated last month