lichao-sun / SoraReviewLinks
The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".
ā506Updated last year
Alternatives and similar repositories for SoraReview
Users that are interested in SoraReview are comparing it to the libraries listed below
Sorting:
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentā601Updated last year
- š„š„š„ A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).ā533Updated 8 months ago
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Surveyā466Updated 11 months ago
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasksā390Updated last year
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationā460Updated last year
- [CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachersā658Updated last year
- Official implementation of SEED-LLaMA (ICLR 2024).ā638Updated last year
- My implementation of "Patch nā Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"ā269Updated 2 months ago
- A list for Text-to-Video, Image-to-Video worksā250Updated 7 months ago
- Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"ā496Updated last year
- Implementation of MagViT2 Tokenizer in Pytorchā656Updated 11 months ago
- SEED-Voken: A Series of Powerful Visual Tokenizersā984Updated last month
- A reading list of video generationā645Updated last week
- Autoregressive Model Beats Diffusion: š¦ Llama for Scalable Image Generationā1,916Updated last year
- Next-Token Prediction is All You Needā2,273Updated last month
- [CVPR2024 Highlight] VBench - We Evaluate Video Generationā1,392Updated 3 weeks ago
- [ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.ā1,835Updated last week
- š This is a repository for organizing papers, codes and other resources related to unified multimodal models.ā773Updated 2 months ago
- Multimodal Models in Real Worldā551Updated 10 months ago
- [TMLR 2025š„] A survey for the autoregressive models in vision.ā774Updated last month
- Emu Series: Generative Multimodal Models from BAAIā1,761Updated last year
- [TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.ā1,900Updated 2 months ago
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"ā863Updated 7 months ago
- Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Thinkā655Updated this week
- Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"ā1,057Updated last week
- ā637Updated last year
- Efficient Multimodal Large Language Models: A Surveyā380Updated 8 months ago
- Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAIā1,295Updated 3 weeks ago
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizerā248Updated last year
- This repo contains the code for 1D tokenizer and generatorā1,087Updated 9 months ago