stepfun-ai/NextStep-1

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/stepfun-ai/NextStep-1)

stepfun-ai / NextStep-1

[🚀 ICLR 2026 Oral] NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s Multimodal Intelligence team.

☆689

Alternatives and similar repositories for NextStep-1

Users that are interested in NextStep-1 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

X-Omni-Team / X-Omni
View on GitHub
Official inference code and LongText-Bench benchmark for our paper X-Omni (https://arxiv.org/pdf/2507.22058).
☆426Aug 26, 2025Updated 10 months ago
yifan123 / flow_grpo
View on GitHub
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
☆2,420May 7, 2026Updated 2 months ago
stepfun-ai / Step3
View on GitHub
☆453Aug 10, 2025Updated 11 months ago
ByteDance-Seed / Bagel
View on GitHub
Open-source unified multimodal model
☆6,103May 4, 2026Updated 2 months ago
stepfun-ai / Step1X-Edit
View on GitHub
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gem…
☆2,236Apr 29, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
baaivision / Emu3.5
View on GitHub
Native Multimodal Models are World Learners
☆1,536Dec 30, 2025Updated 6 months ago
bytetriper / RAE
View on GitHub
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
☆1,977Feb 25, 2026Updated 4 months ago
FoundationVision / Infinity
View on GitHub
[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
☆1,579Apr 16, 2026Updated 3 months ago
black-forest-labs / Self-Flow
View on GitHub
[ICML'26] Code and website for Self-Flow: Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis
☆534May 23, 2026Updated last month
JiuhaiChen / BLIP3o
View on GitHub
Official implementation of BLIP3o-Series
☆1,663Nov 29, 2025Updated 7 months ago
wyhlovecpp / GPT-Image-Edit
View on GitHub
GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
☆243Aug 15, 2025Updated 11 months ago
MCG-NJU / PixNerd
View on GitHub
[ICLR 2026] PixNerd: Pixel Neural Field Diffusion
☆182Dec 10, 2025Updated 7 months ago
Tencent-Hunyuan / HunyuanImage-2.1
View on GitHub
HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation
☆673Oct 14, 2025Updated 9 months ago
bytedance / 1d-tokenizer
View on GitHub
This repo contains the code for 1D tokenizer and generator
☆1,165Mar 20, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
ZitengWangNYU / Scale-RAE
View on GitHub
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
☆255Feb 13, 2026Updated 5 months ago
Gen-Verse / MMaDA
View on GitHub
MMaDA - Open-Sourced Multimodal Large Diffusion Language Models (dLLMs with block diffusion, mixed-CoT, unified RL)
☆1,660Feb 14, 2026Updated 5 months ago
NVlabs / DiffusionNFT
View on GitHub
[ICLR 2026 Oral] DiffusionNFT: Online Diffusion Reinforcement with Forward Process
☆974Feb 10, 2026Updated 5 months ago
stepfun-ai / Step3-VL-10B
View on GitHub
Step3-VL-10B: A compact yet frontier multimodal model achieving SOTA performance at the 10B scale, matching open-source models 10-20x its…
☆407Jan 21, 2026Updated 5 months ago
PKU-YuanGroup / UniWorld
View on GitHub
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
☆883Dec 23, 2025Updated 6 months ago
FoundationVision / LlamaGen
View on GitHub
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
☆1,959Aug 15, 2024Updated last year
tianweiy / DMD2
View on GitHub
(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
☆1,402Mar 5, 2025Updated last year
baaivision / NOVA
View on GitHub
[ICLR 2025] Autoregressive Video Generation without Vector Quantization
☆656Oct 29, 2025Updated 8 months ago
End2End-Diffusion / REPA-E
View on GitHub
[ICCV 2025] Official implementation of the paper: REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers
☆511Dec 6, 2025Updated 7 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
stepfun-ai / SteptronOss
View on GitHub
A lightweight, AI-native training framework for large language models. Designed for fast iteration, reproducible experiments, and modular…
☆576May 18, 2026Updated 2 months ago
csuhan / Tar
View on GitHub
[NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
☆202Sep 18, 2025Updated 10 months ago
hustvl / LightningDiT
View on GitHub
[CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
☆1,507Dec 16, 2025Updated 7 months ago
CostaliyA / Flow-OPD
View on GitHub
Official Repo of "Flow-OPD: On-Policy Distillation for Flow Matching Models"
☆265Jun 24, 2026Updated 3 weeks ago
Jiawei-Yang / DeTok
View on GitHub
Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"
☆195Feb 24, 2026Updated 4 months ago
MiniMax-AI / VTP
View on GitHub
[ECCV 2026] Towards Scalable Pre-training of Visual Tokenizers for Generation
☆495Apr 15, 2026Updated 3 months ago
FoundationVision / UniTok
View on GitHub
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
☆529Nov 14, 2025Updated 8 months ago
baaivision / Emu3
View on GitHub
Next-Token Prediction is All You Need
☆2,432Jan 12, 2026Updated 6 months ago
TencentARC / SEED-Voken
View on GitHub
SEED-Voken: A Series of Powerful Visual Tokenizers
☆1,016Nov 25, 2025Updated 7 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
SkyworkAI / UniPic
View on GitHub
Open-source SOTA multi-image editing model
☆870Jul 13, 2026Updated last week
XueZeyue / DanceGRPO
View on GitHub
An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation
☆1,635Oct 16, 2025Updated 9 months ago
stepfun-ai / PaCoRe
View on GitHub
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning
☆336Feb 5, 2026Updated 5 months ago
wdrink / SimpleAR
View on GitHub
Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"
☆431Jun 20, 2025Updated last year
NVlabs / rcm
View on GitHub
rCM & Causal-rCM: Leading and Unified Algorithms/Infrastructures for Bidirectional/Autoregressive Video Diffusion Distillation at Scale
☆766Jun 25, 2026Updated 3 weeks ago
CaraJ7 / T2I-R1
View on GitHub
[NeurIPS 2025] T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
☆433Sep 18, 2025Updated 10 months ago
facebookresearch / tuna-2
View on GitHub
Official implementation of Tuna-2: Pixel Embeddings Beat Vision Encoders for Unified Understanding and Generation
☆738Updated this week