apple / ml-diffucoderLinks
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
☆673Updated 3 weeks ago
Alternatives and similar repositories for ml-diffucoder
Users that are interested in ml-diffucoder are comparing it to the libraries listed below
Sorting:
- Dream 7B, a large diffusion language model☆857Updated last month
- Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input☆831Updated last month
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆316Updated last month
- Simple & Scalable Pretraining for Neural Architecture Research☆277Updated last week
- Scaling RL on advanced reasoning models☆543Updated this week
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆318Updated 9 months ago
- Self-Adapting Language Models☆733Updated last month
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆417Updated 2 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆342Updated 7 months ago
- Seed-Coder is a family of lightweight open-source code LLMs comprising base, instruct and reasoning models, developed by ByteDance Seed.☆535Updated last month
- GRadient-INformed MoE☆264Updated 10 months ago
- Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models☆747Updated 3 weeks ago
- Pretraining and inference code for a large-scale depth-recurrent language model☆806Updated 2 weeks ago
- The official repository of ALE-Bench☆107Updated 2 weeks ago
- Benchmark environment for evaluating vision-language models (VLMs) on popular video games!☆287Updated 2 months ago
- A lightweight, local-first, and free experiment tracking Python library built on top of 🤗 Datasets and Spaces.☆339Updated this week
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆217Updated last month
- Official PyTorch implementation for ICLR2025 paper "Scaling up Masked Diffusion Models on Text"☆264Updated 7 months ago
- Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation☆314Updated this week
- Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"☆250Updated last month
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆253Updated 2 months ago
- Releases from OpenAI Preparedness☆810Updated last week
- An AI benchmark for creative, human-like problem solving using Sudoku variants☆79Updated 2 months ago
- ☆455Updated 2 weeks ago
- A Tree Search Library with Flexible API for LLM Inference-Time Scaling☆423Updated this week
- Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model☆230Updated 2 months ago
- Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆571Updated 4 months ago
- Train high-quality text-to-image diffusion models in a data & compute efficient manner☆501Updated 4 months ago
- TPI-LLM: Serving 70b-scale LLMs Efficiently on Low-resource Edge Devices☆186Updated 2 months ago
- MMaDA - Open-Sourced Multimodal Large Diffusion Language Models☆1,254Updated last month