Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning
β234Jan 22, 2026Updated last month
Alternatives and similar repositories for metamorph
Users that are interested in metamorph are comparing it to the libraries listed below
Sorting:
- π This is a repository for organizing papers, codes and other resources related to unified multimodal models.β800Oct 10, 2025Updated 4 months ago
- Official implementation of BLIP3o-Seriesβ1,637Nov 29, 2025Updated 3 months ago
- [NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understandingβ512Nov 14, 2025Updated 3 months ago
- [ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.β1,875Jan 8, 2026Updated last month
- Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"β426Jun 20, 2025Updated 8 months ago
- Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflectionβ55Aug 16, 2025Updated 6 months ago
- Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).β201Apr 29, 2025Updated 10 months ago
- Official Implementation of Paper Transfer between Modalities with MetaQueriesβ304Oct 12, 2025Updated 4 months ago
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ418Apr 25, 2025Updated 10 months ago
- [ICCV 2025] Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"β198Jan 7, 2026Updated last month
- [CVPR 2025] Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesisβ131May 16, 2025Updated 9 months ago
- Code for Scaling Language-Free Visual Representation Learning (WebSSL)β245Apr 24, 2025Updated 10 months ago
- VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learningβ270Apr 15, 2025Updated 10 months ago
- Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoningβ237May 30, 2025Updated 9 months ago
- Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"β310Sep 28, 2025Updated 5 months ago
- [CVPR2025] PyTorch-based reimplementation of CrossFlow, as proposed in 'Flowing from Words to Pixels: A Noise-Free Framework for Cross-Moβ¦β329Jun 8, 2025Updated 8 months ago
- [CVPR 2025] π₯ Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".β438Aug 8, 2025Updated 6 months ago
- π This is a repository for organizing papers, codes, and other resources related to unified multimodal models.β349Jan 8, 2026Updated last month
- Autoregressive Model Beats Diffusion: π¦ Llama for Scalable Image Generationβ1,936Aug 15, 2024Updated last year
- Open-source unified multimodal modelβ5,686Oct 27, 2025Updated 4 months ago
- Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretrainiβ¦β638Oct 16, 2025Updated 4 months ago
- PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838β1,863Feb 20, 2026Updated last week
- Next-Token Prediction is All You Needβ2,355Jan 12, 2026Updated last month
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.β1,986Nov 7, 2025Updated 3 months ago
- This is a repo to track the latest autoregressive visual generation papers.β431Jun 25, 2025Updated 8 months ago
- [CVPR 2025] The First Investigation of CoT Reasoning (RL, TTS, Reflection) in Image Generationβ857May 23, 2025Updated 9 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generationβ186Nov 6, 2025Updated 3 months ago
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesisβ62Apr 27, 2025Updated 10 months ago
- [CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Modelsβ1,402Dec 16, 2025Updated 2 months ago
- [NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representationsβ201Sep 18, 2025Updated 5 months ago
- Official repo and evaluation implementation of VSI-Benchβ673Aug 5, 2025Updated 6 months ago
- β190Dec 17, 2024Updated last year
- β141Oct 15, 2025Updated 4 months ago
- High-performance Image Tokenizers for VAR and ARβ303Apr 25, 2025Updated 10 months ago
- [NeurIPS 2025] T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoTβ430Sep 18, 2025Updated 5 months ago
- β34May 14, 2025Updated 9 months ago
- β291Jul 29, 2025Updated 7 months ago
- [NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RLβ2,024Nov 4, 2025Updated 3 months ago
- [ICCV 2025] GameFactory: Creating New Games with Generative Interactive Videosβ472Mar 22, 2025Updated 11 months ago