YingqingHe / Awesome-LLMs-meet-Multimodal-GenerationLinks

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

☆494

Alternatives and similar repositories for Awesome-LLMs-meet-Multimodal-Generation

Users that are interested in Awesome-LLMs-meet-Multimodal-Generation are comparing it to the libraries listed below

Sorting:

showlab / Awesome-Unified-Multimodal-Models
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
☆646Updated this week
CodeGoat24 / UnifiedReward
Official implementation of UnifiedReward & UnifiedReward-Think
☆493Updated last week
AIDC-AI / Awesome-Unified-Multimodal-Models
Awesome Unified Multimodal Models
☆513Updated last month
ziqihuangg / Awesome-Evaluation-of-Visual-Generation
A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems
☆334Updated this week
ChaofanTao / Autoregressive-Models-in-Vision-Survey
[TMLR 2025🔥] A survey for the autoregressive models in vision.
☆665Updated last week
soraw-ai / Awesome-Text-to-Video-Generation
A list for Text-to-Video, Image-to-Video works
☆241Updated 2 months ago
LMM101 / Awesome-Multimodal-Next-Token-Prediction
[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
☆446Updated 6 months ago
yzhang2016 / video-generation-survey
A reading list of video generation
☆607Updated 2 weeks ago
AlonzoLeeeooo / awesome-video-generation
A collection of awesome video generation studies.
☆586Updated last week
ByteFlow-AI / TokenFlow
[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".
☆364Updated 2 weeks ago
lxa9867 / Awesome-Autoregressive-Visual-Generation
This is a repo to track the latest autoregressive visual generation papers.
☆382Updated last month
snap-research / Panda-70M
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
☆613Updated 9 months ago
wdrink / SimpleAR
Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"
☆390Updated last month
baaivision / NOVA
[ICLR 2025] Autoregressive Video Generation without Vector Quantization
☆556Updated 3 weeks ago
rongyaofang / GoT
Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"
☆272Updated 3 months ago
tulerfeng / Video-R1
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆646Updated last week
FoundationVision / UniTok
A Unified Tokenizer for Visual Generation and Understanding
☆371Updated this week
RunpeiDong / DreamLLM
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
☆453Updated 8 months ago
Purshow / Awesome-Unified-Multimodal
📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.
☆268Updated last week
mit-han-lab / vila-u
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
☆374Updated 3 months ago
jy0205 / LaVIT
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
☆585Updated 10 months ago
AILab-CVC / SEED-X
Multimodal Models in Real World
☆530Updated 5 months ago
VARGPT-family / VARGPT-v1.1
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
☆260Updated 3 months ago
haoningwu3639 / StoryGen
[CVPR 2024] Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models
☆250Updated 8 months ago
atfortes / Awesome-Controllable-Diffusion
Papers and resources on Controllable Generation using Diffusion Models, including ControlNet, DreamBooth, IP-Adapter.
☆482Updated last month
lichao-sun / SoraReview
The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision M…
☆497Updated last year
XueZeyue / DanceGRPO
An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation
☆537Updated this week
mira-space / MiraData
Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
☆461Updated 11 months ago
ShareGPT4Omni / ShareGPT4V
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
☆229Updated last year
zai-org / VisionReward
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
☆292Updated 4 months ago