apple / ml-diffucoderLinks
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
☆788Updated 6 months ago
Alternatives and similar repositories for ml-diffucoder
Users that are interested in ml-diffucoder are comparing it to the libraries listed below
Sorting:
- Dream 7B, a large diffusion language model☆1,157Updated 2 months ago
- ☆1,278Updated 2 months ago
- Official JAX implementation of End-to-End Test-Time Training for Long Context☆478Updated 2 weeks ago
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆358Updated 7 months ago
- Open-source release accompanying Gao et al. 2025☆498Updated last month
- Research code artifacts for Code World Model (CWM) including inference tools, reproducibility, and documentation.☆806Updated last month
- ShinkaEvolve: Towards Open-Ended and Sample-Efficient Program Evolution☆812Updated last week
- WeDLM: The fastest diffusion language model with standard causal attention and native KV cache compatibility, delivering real speedups ov…☆597Updated 2 weeks ago
- dLLM: Simple Diffusion Language Modeling☆1,693Updated 3 weeks ago
- ☆385Updated 2 months ago
- Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input☆938Updated 7 months ago
- Simple & Scalable Pretraining for Neural Architecture Research☆307Updated last month
- Pretraining and inference code for a large-scale depth-recurrent language model☆861Updated last month
- Scaling RL on advanced reasoning models☆661Updated 3 months ago
- [ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models☆949Updated 6 months ago
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆361Updated 8 months ago
- Large multi-modal models (L3M) pre-training.☆229Updated 4 months ago
- GPU-optimized framework for training diffusion language models at any scale. The backend of Quokka, Super Data Learners, and OpenMoE 2 tr…☆318Updated 2 months ago
- ToolOrchestra is an end-to-end RL training framework for orchestrating tools and agentic workflows.☆623Updated last week
- [ICLR'26] The official code implementation for "Cache-to-Cache: Direct Semantic Communication Between Large Language Models"☆324Updated last week
- OpenTinker is an RL-as-a-Service infrastructure for foundation models☆618Updated last week
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆469Updated 8 months ago
- Official implementation of "Continuous Autoregressive Language Models"☆726Updated 2 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆371Updated last year
- codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)☆746Updated last month
- Extending the Context of Pretrained LLMs by Dropping Their Positional Embedding☆193Updated 3 weeks ago
- PyTorch-native post-training at scale☆605Updated this week
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆347Updated last year
- Official PyTorch implementation for ICLR2025 paper "Scaling up Masked Diffusion Models on Text"☆364Updated last year
- Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"☆399Updated last week