lichao-sun / SoraReviewLinks
The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".
ā497Updated last year
Alternatives and similar repositories for SoraReview
Users that are interested in SoraReview are comparing it to the libraries listed below
Sorting:
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentā590Updated 11 months ago
- š„š„š„ A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).ā509Updated 5 months ago
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Surveyā447Updated 7 months ago
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationā456Updated 9 months ago
- Implementation of MagViT2 Tokenizer in Pytorchā630Updated 8 months ago
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasksā389Updated last year
- My implementation of "Patch nā Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"ā252Updated last week
- Official implementation of SEED-LLaMA (ICLR 2024).ā621Updated 11 months ago
- [CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachersā624Updated 10 months ago
- SEED-Voken: A Series of Powerful Visual Tokenizersā935Updated 2 months ago
- Autoregressive Model Beats Diffusion: š¦ Llama for Scalable Image Generationā1,858Updated last year
- A reading list of video generationā614Updated this week
- Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"ā477Updated last year
- [ICLR 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.ā1,696Updated this week
- Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"ā961Updated last year
- A list for Text-to-Video, Image-to-Video worksā244Updated 3 months ago
- [TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.ā1,870Updated 5 months ago
- Next-Token Prediction is All You Needā2,195Updated 5 months ago
- [TMLR 2025š„] A survey for the autoregressive models in vision.ā693Updated this week
- Efficient Multimodal Large Language Models: A Surveyā371Updated 4 months ago
- [CVPR2024 Highlight] VBench - We Evaluate Video Generationā1,206Updated this week
- MiniSora: A community aims to explore the implementation path and future development direction of Sora.ā1,265Updated 6 months ago
- Emu Series: Generative Multimodal Models from BAAIā1,744Updated 11 months ago
- Awesome Unified Multimodal Modelsā671Updated 3 weeks ago
- š This is a repository for organizing papers, codes and other resources related to unified multimodal models.ā681Updated last month
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"ā860Updated 4 months ago
- This repo contains the code for 1D tokenizer and generatorā1,023Updated 5 months ago
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizerā240Updated last year
- A collection of awesome video generation studies.ā623Updated 3 weeks ago
- PyTorch implementation of RCG https://arxiv.org/abs/2312.03701ā927Updated 11 months ago