YingqingHe / Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
☆428Updated last month
Alternatives and similar repositories for Awesome-LLMs-meet-Multimodal-Generation:
Users that are interested in Awesome-LLMs-meet-Multimodal-Generation are comparing it to the libraries listed below
- 📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.☆374Updated last month
- The paper collections for the autoregressive models in vision.☆406Updated this week
- Diffusion Model-Based Image Editing: A Survey (arXiv)☆557Updated this week
- A reading list of video generation☆496Updated this week
- A collection of awesome video generation studies.☆451Updated last month
- 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆253Updated last month
- A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems☆243Updated 3 weeks ago
- A list for Text-to-Video, Image-to-Video works☆222Updated 2 months ago
- [CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers☆572Updated 3 months ago
- [CVPR2024 Highlight] VBench - We Evaluate Video Generation☆758Updated this week
- Official code of SmartEdit [CVPR-2024 Highlight]☆295Updated 7 months ago
- SEED-Voken: A Series of Powerful Visual Tokenizers☆830Updated this week
- The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision M…☆495Updated 11 months ago
- Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"☆405Updated 5 months ago
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation☆417Updated 2 months ago
- A Collection of Papers and Codes for CVPR2025/CVPR2024/ECCV2024 AIGC☆474Updated 3 weeks ago
- [ICLR 2025] Autoregressive Video Generation without Vector Quantization☆385Updated this week
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆227Updated 3 weeks ago
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey☆357Updated last month
- This repo contains the code for 1D tokenizer and generator☆691Updated last week
- You can easily calculate FVD, PSNR, SSIM, LPIPS for evaluating the quality of generated or predicted videos.☆294Updated last month
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Content☆561Updated 4 months ago
- [CVPR 2024] Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models☆237Updated 2 months ago
- A collection of awesome text-to-image generation studies.☆517Updated last week
- VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE☆281Updated last month
- [CVPR 2024] EvalCrafter: Benchmarking and Evaluating Large Video Generation Models☆155Updated 4 months ago
- Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis☆957Updated this week
- Official repo and evaluation implementation of VSI-Bench☆380Updated 3 weeks ago
- [ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.☆1,215Updated last week
- [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions☆196Updated 7 months ago