RadicalNumerics / RND1Links
RND1: Scaling Diffusion Language Models
☆172Updated 3 weeks ago
Alternatives and similar repositories for RND1
Users that are interested in RND1 are comparing it to the libraries listed below
Sorting:
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆56Updated 10 months ago
- Here we will test various linear attention designs.☆62Updated last year
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆137Updated last month
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…☆54Updated 3 weeks ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆233Updated 7 months ago
- Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…☆46Updated 5 months ago
- Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible…☆98Updated 2 weeks ago
- research impl of Native Sparse Attention (2502.11089)☆63Updated 11 months ago
- Repository for "TESS-2: A Large-Scale, Generalist Diffusion Language Model"☆54Updated 11 months ago
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.☆130Updated 2 months ago
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.☆84Updated 2 months ago
- [ICLR 2026] GRAPE: Group Representational Position Encoding (https://arxiv.org/abs/2512.07805)☆78Updated 2 weeks ago
- [ICLR 2025] Code for the paper "Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning"☆86Updated 11 months ago
- FlashRNN - Fast RNN Kernels with I/O Awareness☆174Updated 3 months ago
- Supporting code for the blog post on modular manifolds.☆115Updated 4 months ago
- ☆270Updated 8 months ago
- Code accompanying the paper "Generalized Interpolating Discrete Diffusion"☆113Updated 8 months ago
- Official Jax Implementation of MD4 Masked Diffusion Models☆153Updated 11 months ago
- Multi-Turn RL Training System with AgentTrainer for Language Model Game Reinforcement Learning☆59Updated last month
- Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".☆144Updated 2 months ago
- Griffin MQA + Hawk Linear RNN Hybrid☆88Updated last year
- Esoteric Language Models☆111Updated this week
- ☆67Updated 10 months ago
- Official code implementation for the work Preference Alignment with Flow Matching (NeurIPS 2024)☆66Updated last year
- Explorations into the recently proposed Taylor Series Linear Attention☆100Updated last year
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆163Updated 9 months ago
- Implementation of the proposed MaskBit from Bytedance AI☆83Updated last year
- Awesome Triton Resources☆39Updated 9 months ago
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆168Updated 3 weeks ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28Updated 9 months ago