facebookresearch / metamorphLinks
Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning
β232Updated last week
Alternatives and similar repositories for metamorph
Users that are interested in metamorph are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] π₯ Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".β430Updated 5 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generationβ182Updated 2 months ago
- β311Updated last month
- PyTorch implementation of NEPAβ303Updated last week
- β189Updated last year
- Official Implementation of Paper Transfer between Modalities with MetaQueriesβ296Updated 3 months ago
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ417Updated 9 months ago
- Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoningβ236Updated 8 months ago
- Official Pytorch implementation for LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior (ICLR 2025 Oral).β98Updated 11 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generationβ185Updated 8 months ago
- Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"β305Updated 4 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmarkβ138Updated 8 months ago
- Empowering Unified MLLM with Multi-granular Visual Generationβ129Updated last year
- Official Implementation of LaViDa: :A Large Diffusion Language Model for Multimodal Understandingβ191Updated last month
- [ICLR 2026] Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Visionβ205Updated last week
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"β89Updated last year
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)β238Updated 6 months ago
- The author's implementation of FUDOKI, a multimodal large language model purely based on discrete flow matching.β67Updated last month
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generationβ95Updated 11 months ago
- (ICCV 2025) "Principal Components" Enable A New Language of Imagesβ79Updated 6 months ago
- [NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representationsβ198Updated 4 months ago
- The code repository of UniRLβ51Updated 8 months ago
- [ICLR 2026] Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potentiβ¦β354Updated this week
- [COLM'25] Official implementation of the Law of Vision Representation in MLLMsβ176Updated 3 months ago
- [NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understandingβ503Updated 2 months ago
- [ICCV 2025] Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"β198Updated 3 weeks ago
- [ICLR 2026] Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusioβ¦β98Updated last week
- [ICCV2025] TokenBridge: Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation. https://yuqingwang1029.github.io/Toβ¦β151Updated 6 months ago
- β80Updated 7 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generationβ236Updated 5 months ago