raymin0223 / mixture_of_recursionsLinks

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation (NeurIPS 2025)

☆508

Alternatives and similar repositories for mixture_of_recursions

Users that are interested in mixture_of_recursions are comparing it to the libraries listed below

Sorting:

goombalab / hnet
H-Net: Hierarchical Network with Dynamic Chunking
☆781Updated last month
tensorgi / TPA
[NeurIPS 2025 Spotlight] TPA: Tensor ProducT ATTenTion Transformer (T6) (https://arxiv.org/abs/2501.06425)
☆426Updated 3 weeks ago
lucidrains / native-sparse-attention-pytorch
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
☆780Updated 3 months ago
ZHZisZZ / dllm
dLLM: Simple Diffusion Language Modeling
☆950Updated this week
NVlabs / hymba
☆200Updated 11 months ago
Haiyang-W / TokenFormer
[ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
☆578Updated 9 months ago
apple / ml-sigmoid-attention
☆301Updated 7 months ago
DreamLM / Dream
Dream 7B, a large diffusion language model
☆1,081Updated last month
kuleshov-group / bd3lms
[ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
☆888Updated 4 months ago
ML-GSAI / SMDM
Official PyTorch implementation for ICLR2025 paper "Scaling up Masked Diffusion Models on Text"
☆339Updated 11 months ago
NVlabs / GatedDeltaNet
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆376Updated 2 months ago
MuLabPKU / TransMLA
TransMLA: Multi-Head Latent Attention Is All You Need (NeurIPS 2025 Spotlight)
☆410Updated 2 months ago
jxiw / MambaInLlama
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
☆231Updated last month
JinjieNi / MegaDLMs
GPU-optimized framework for training diffusion language models at any scale. The backend of Quokka, Super Data Learners, and OpenMoE 2 tr…
☆272Updated last week
pengzhangzhi / Open-dLLM
The most open diffusion language model for code generation — releasing pretraining, evaluation, inference, and checkpoints.
☆458Updated last week
shaochenze / calm
Official implementation of "Continuous Autoregressive Language Models"
☆584Updated last week
alexiglad / EBT
PyTorch Code for Energy-Based Transformers paper -- generalizable reasoning and scalable learning
☆558Updated last week
facebookresearch / memory
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…
☆356Updated 11 months ago
VILA-Lab / Awesome-DLMs
The official GitHub repo for the survey paper "A Survey on Diffusion Language Models".
☆478Updated this week
MoonshotAI / Kimi-Linear
☆1,184Updated this week
weigao266 / Awesome-Efficient-Arch
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
☆359Updated last week
NVlabs / Fast-dLLM
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
☆676Updated last month
apple / ml-cross-entropy
☆550Updated last month
HKUNLP / DiffuLLaMA
[ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models
☆334Updated 5 months ago
dllm-reasoning / d1
Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"
☆361Updated 4 months ago
seal-rg / recurrent-pretraining
Pretraining and inference code for a large-scale depth-recurrent language model
☆847Updated last month
RyanLiu112 / compute-optimal-tts
Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
☆274Updated 9 months ago
ZihanWang314 / CoE
Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models
☆223Updated 2 weeks ago
QwenLM / ParScale
Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling
☆451Updated 6 months ago
ypwang61 / One-Shot-RLVR
[NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example
☆376Updated last month