apple / ml-diffucoderLinks
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
☆763Updated 4 months ago
Alternatives and similar repositories for ml-diffucoder
Users that are interested in ml-diffucoder are comparing it to the libraries listed below
Sorting:
- Dream 7B, a large diffusion language model☆1,081Updated last month
- ☆1,184Updated this week
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆349Updated 5 months ago
- Simple & Scalable Pretraining for Neural Architecture Research☆300Updated 3 weeks ago
- Research code artifacts for Code World Model (CWM) including inference tools, reproducibility, and documentation.☆717Updated last month
- GPU-optimized framework for training diffusion language models at any scale. The backend of Quokka, Super Data Learners, and OpenMoE 2 tr…☆272Updated last week
- dLLM: Simple Diffusion Language Modeling☆950Updated this week
- Scaling RL on advanced reasoning models☆632Updated last month
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆356Updated 11 months ago
- Pretraining and inference code for a large-scale depth-recurrent language model☆847Updated last month
- ☆317Updated 2 weeks ago
- Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input☆917Updated 5 months ago
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆451Updated 6 months ago
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆130Updated 3 months ago
- [ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models☆888Updated 4 months ago
- Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"☆361Updated 4 months ago
- codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)☆670Updated 3 weeks ago
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆334Updated 5 months ago
- Official PyTorch implementation for ICLR2025 paper "Scaling up Masked Diffusion Models on Text"☆339Updated 11 months ago
- ShinkaEvolve: Towards Open-Ended and Sample-Efficient Program Evolution☆665Updated this week
- ☆550Updated last month
- ☆702Updated last month
- Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation (NeurIPS 2025)☆508Updated last month
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆223Updated 2 weeks ago
- ☆907Updated 2 weeks ago
- ☆838Updated 2 months ago
- Official implementation of "Continuous Autoregressive Language Models"☆584Updated last week
- Code for the paper: "Learning to Reason without External Rewards"☆375Updated 4 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆327Updated last year
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆829Updated this week