showlab / Awesome-Unified-Multimodal-ModelsLinks

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

☆725

Alternatives and similar repositories for Awesome-Unified-Multimodal-Models

Users that are interested in Awesome-Unified-Multimodal-Models are comparing it to the libraries listed below

Sorting:

AIDC-AI / Awesome-Unified-Multimodal-Models
Awesome Unified Multimodal Models
☆805Updated 2 months ago
CodeGoat24 / UnifiedReward
Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think
☆574Updated this week
ChaofanTao / Autoregressive-Models-in-Vision-Survey
[TMLR 2025🔥] A survey for the autoregressive models in vision.
☆725Updated this week
LMM101 / Awesome-Multimodal-Next-Token-Prediction
[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
☆451Updated 9 months ago
Purshow / Awesome-Unified-Multimodal
📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.
☆319Updated last week
ByteVisionLab / TokenFlow
[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".
☆393Updated 2 months ago
tulerfeng / Video-R1
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆718Updated last month
showlab / Show-o
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
☆1,751Updated this week
YingqingHe / Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
☆509Updated 6 months ago
lxa9867 / Awesome-Autoregressive-Visual-Generation
This is a repo to track the latest autoregressive visual generation papers.
☆405Updated 4 months ago
yaotingwangofficial / Awesome-MCoT
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
☆849Updated 2 months ago
FoundationVision / UniTok
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
☆425Updated last month
dvlab-research / VisionZip
Official repository for VisionZip (CVPR 2025)
☆363Updated 3 months ago
daixiangzi / Awesome-Token-Compress
A paper list of some recent works about Token Compress for Vit and VLM
☆709Updated this week
bytedance / 1d-tokenizer
This repo contains the code for 1D tokenizer and generator
☆1,052Updated 7 months ago
mit-han-lab / vila-u
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
☆398Updated 6 months ago
zhaochen0110 / Awesome_Think_With_Images
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…
☆1,033Updated 3 weeks ago
TencentARC / SEED-Voken
SEED-Voken: A Series of Powerful Visual Tokenizers
☆956Updated 3 months ago
Osilly / Vision-R1
This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages …
☆717Updated last month
Wang-Xiaodong1899 / Open-R1-Video
✨First Open-Source R1-like Video-LLM [2025/02/18]
☆369Updated 8 months ago
deepcs233 / Visual-CoT
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …
☆387Updated 10 months ago
rongyaofang / GoT
Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"
☆291Updated 3 weeks ago
ziqihuangg / Awesome-Evaluation-of-Visual-Generation
A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems
☆368Updated last month
wdrink / SimpleAR
Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"
☆411Updated 4 months ago
lucidrains / transfusion-pytorch
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
☆1,241Updated last week
ModalMinds / MM-EUREKA
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
☆751Updated last month
Visual-Agent / DeepEyes
☆883Updated this week
turningpoint-ai / VisualThinker-R1-Zero
Explore the Multimodal “Aha Moment” on 2B Model
☆613Updated 7 months ago
NVlabs / Long-RL
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆644Updated last month
Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…
☆1,227Updated last week