lichao-sun / SoraReviewLinks
The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".
ā500Updated last year
Alternatives and similar repositories for SoraReview
Users that are interested in SoraReview are comparing it to the libraries listed below
Sorting:
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentā594Updated last year
- š„š„š„ A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).ā512Updated 6 months ago
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Surveyā451Updated 9 months ago
- Official implementation of SEED-LLaMA (ICLR 2024).ā631Updated last year
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationā459Updated 10 months ago
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasksā389Updated last year
- [CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachersā637Updated last year
- A reading list of video generationā619Updated this week
- Autoregressive Model Beats Diffusion: š¦ Llama for Scalable Image Generationā1,879Updated last year
- Next-Token Prediction is All You Needā2,216Updated 7 months ago
- Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"ā486Updated last year
- Implementation of MagViT2 Tokenizer in Pytorchā642Updated 9 months ago
- My implementation of "Patch nā Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"ā258Updated 2 weeks ago
- SEED-Voken: A Series of Powerful Visual Tokenizersā956Updated 3 months ago
- A list for Text-to-Video, Image-to-Video worksā243Updated 4 months ago
- [CVPR2024 Highlight] VBench - We Evaluate Video Generationā1,269Updated last week
- [TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.ā1,879Updated 6 months ago
- MiniSora: A community aims to explore the implementation path and future development direction of Sora.ā1,264Updated 8 months ago
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"ā861Updated 5 months ago
- š This is a repository for organizing papers, codes and other resources related to unified multimodal models.ā725Updated 2 weeks ago
- [CVPR 2024] OneLLM: One Framework to Align All Modalities with Languageā656Updated last year
- Efficient Multimodal Large Language Models: A Surveyā373Updated 5 months ago
- Multimodal Models in Real Worldā549Updated 8 months ago
- [ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.ā1,751Updated this week
- [TMLR 2025š„] A survey for the autoregressive models in vision.ā725Updated this week
- Papers and resources on Controllable Generation using Diffusion Models, including ControlNet, DreamBooth, IP-Adapter.ā493Updated 4 months ago
- Emu Series: Generative Multimodal Models from BAAIā1,746Updated last year
- VideoSys: An easy and efficient system for video generationā2,004Updated last month
- Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"ā995Updated last year
- A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".ā1,053Updated 2 years ago