showlab / Show-oLinks

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

☆1,742

Alternatives and similar repositories for Show-o

Users that are interested in Show-o are comparing it to the libraries listed below

Sorting:

TencentARC / SEED-Voken
SEED-Voken: A Series of Powerful Visual Tokenizers
☆956Updated 3 months ago
FoundationVision / LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
☆1,879Updated last year
bytedance / 1d-tokenizer
This repo contains the code for 1D tokenizer and generator
☆1,052Updated 7 months ago
baaivision / Emu3
Next-Token Prediction is All You Need
☆2,216Updated 7 months ago
FoundationVision / Infinity
[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
☆1,471Updated 4 months ago
showlab / Awesome-Unified-Multimodal-Models
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
☆725Updated 2 weeks ago
yifan123 / flow_grpo
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
☆1,463Updated last week
lucidrains / transfusion-pytorch
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
☆1,241Updated last week
ZiyuGuo99 / Image-Generation-CoT
[CVPR 2025] The First Investigation of CoT Reasoning (RL, TTS, Reflection) in Image Generation
☆817Updated 5 months ago
JiuhaiChen / BLIP3o
Official implementation of BLIP3o-Series
☆1,536Updated this week
LTH14 / mar
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
☆1,766Updated last year
sihyun-yu / REPA
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
☆1,368Updated 7 months ago
ChaofanTao / Autoregressive-Models-in-Vision-Survey
[TMLR 2025🔥] A survey for the autoregressive models in vision.
☆725Updated this week
jy0205 / LaVIT
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
☆594Updated last year
tulerfeng / Video-R1
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆718Updated last month
snap-research / Panda-70M
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
☆631Updated last year
lucidrains / magvit2-pytorch
Implementation of MagViT2 Tokenizer in Pytorch
☆642Updated 9 months ago
AIDC-AI / Awesome-Unified-Multimodal-Models
Awesome Unified Multimodal Models
☆805Updated 2 months ago
CodeGoat24 / UnifiedReward
Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think
☆574Updated this week
YingqingHe / Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
☆509Updated 6 months ago
Gen-Verse / MMaDA
[NeurIPS 2025] MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
☆1,445Updated last week
Vchitect / VBench
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
☆1,269Updated last week
willisma / SiT
Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
☆988Updated last year
Alpha-VLLM / Lumina-mGPT
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraini…
☆626Updated last week
LMM101 / Awesome-Multimodal-Next-Token-Prediction
[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
☆451Updated 9 months ago
AlonzoLeeeooo / awesome-video-generation
A collection of awesome video generation studies.
☆650Updated last week
MME-Benchmarks / Video-MME
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
☆663Updated 2 months ago
GAIR-NLP / anole
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
☆810Updated 4 months ago
baaivision / NOVA
[ICLR 2025] Autoregressive Video Generation without Vector Quantization
☆581Updated last month
EvolvingLMMs-Lab / open-r1-multimodal
A fork to add multimodal model training to open-r1
☆1,409Updated 8 months ago