AIDC-AI / Awesome-Unified-Multimodal-Models
Awesome Unified Multimodal Models
☆77Updated this week
Alternatives and similar repositories for Awesome-Unified-Multimodal-Models
Users that are interested in Awesome-Unified-Multimodal-Models are comparing it to the libraries listed below
Sorting:
- Empowering Unified MLLM with Multi-granular Visual Generation☆119Updated 3 months ago
- Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"☆237Updated last week
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆321Updated 2 months ago
- [CVPRW 2025] UniToken is an auto-regressive generation model that combines discrete and continuous representations to process visual inpu…☆81Updated 2 weeks ago
- ☆83Updated last month
- Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"☆333Updated 2 weeks ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆83Updated last month
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆156Updated 3 weeks ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆184Updated this week
- A collection of vision foundation models unifying understanding and generation.☆55Updated 4 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆99Updated 2 weeks ago
- Implements VAR+CLIP for text-to-image (T2I) generation☆136Updated 3 months ago
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction☆94Updated 2 months ago
- [CVPR 2025 (Oral)] Open implementation of "RandAR"☆134Updated last month
- Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆82Updated last month
- ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning☆31Updated last month
- Unifying Visual Understanding and Generation with Dual Visual Vocabularies 🌈☆43Updated 3 weeks ago
- A Unified Tokenizer for Visual Generation and Understanding☆282Updated this week
- Official implementation of UnifiedReward & UnifiedReward-Think☆304Updated this week
- [CVPR2025 Highlight] PAR: Parallelized Autoregressive Visual Generation. https://yuqingwang1029.github.io/PAR-project☆151Updated last month
- [ICLR'25] Reconstructive Visual Instruction Tuning☆83Updated last month
- 【CVPR 2025 Oral】Official Repo for Paper "AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea"☆118Updated last month
- Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"☆197Updated 2 weeks ago
- VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning☆237Updated 3 weeks ago
- PyTorch implementation of DiffMoE, TC-DiT, EC-DiT and Dense DiT☆78Updated 3 weeks ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆155Updated 2 months ago
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆178Updated 7 months ago
- ☆144Updated 3 months ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆44Updated 2 months ago
- This is a repo to track the latest autoregressive visual generation papers.☆300Updated last week