lichao-sun / SoraReviewLinks
The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".
ā498Updated last year
Alternatives and similar repositories for SoraReview
Users that are interested in SoraReview are comparing it to the libraries listed below
Sorting:
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentā590Updated 10 months ago
- š„š„š„ A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).ā498Updated 4 months ago
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Surveyā447Updated 7 months ago
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasksā386Updated last year
- Implementation of MagViT2 Tokenizer in Pytorchā624Updated 7 months ago
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationā454Updated 8 months ago
- MiniSora: A community aims to explore the implementation path and future development direction of Sora.ā1,264Updated 6 months ago
- Official implementation of SEED-LLaMA (ICLR 2024).ā620Updated 11 months ago
- My implementation of "Patch nā Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"ā250Updated last month
- [CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachersā615Updated 9 months ago
- Next-Token Prediction is All You Needā2,178Updated 5 months ago
- A list for Text-to-Video, Image-to-Video worksā242Updated 2 months ago
- SEED-Voken: A Series of Powerful Visual Tokenizersā931Updated last month
- Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"ā468Updated 11 months ago
- A reading list of video generationā612Updated this week
- [CVPR2024 Highlight] VBench - We Evaluate Video Generationā1,161Updated last week
- [TMLR 2025š„] A survey for the autoregressive models in vision.ā675Updated last week
- Efficient Multimodal Large Language Models: A Surveyā365Updated 3 months ago
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizerā235Updated last year
- PyTorch implementation of RCG https://arxiv.org/abs/2312.03701ā926Updated 10 months ago
- Multimodal Models in Real Worldā531Updated 6 months ago
- Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"ā943Updated last year
- [TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.ā1,858Updated 4 months ago
- [ICLR 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.ā1,650Updated last week
- Autoregressive Model Beats Diffusion: š¦ Llama for Scalable Image Generationā1,843Updated last year
- This repo contains the code for 1D tokenizer and generatorā996Updated 5 months ago
- š This is a repository for organizing papers, codes and other resources related to unified multimodal models.ā666Updated 3 weeks ago
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Textā388Updated 3 months ago
- A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".ā1,039Updated 2 years ago
- ā621Updated last year