lichao-sun / SoraReviewLinks
The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".
ā497Updated last year
Alternatives and similar repositories for SoraReview
Users that are interested in SoraReview are comparing it to the libraries listed below
Sorting:
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentā585Updated 9 months ago
- š„š„š„ A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).ā494Updated 3 months ago
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Surveyā446Updated 6 months ago
- Implementation of MagViT2 Tokenizer in Pytorchā622Updated 6 months ago
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasksā386Updated last year
- [CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachersā613Updated 9 months ago
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationā452Updated 8 months ago
- Official implementation of SEED-LLaMA (ICLR 2024).ā618Updated 10 months ago
- My implementation of "Patch nā Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"ā249Updated last week
- [CVPR2024 Highlight] VBench - We Evaluate Video Generationā1,126Updated last week
- SEED-Voken: A Series of Powerful Visual Tokenizersā920Updated last month
- A reading list of video generationā606Updated last week
- Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"ā461Updated 11 months ago
- [TMLR 2025š„] A survey for the autoregressive models in vision.ā660Updated last week
- A list for Text-to-Video, Image-to-Video worksā241Updated 2 months ago
- [CVPR 2024] OneLLM: One Framework to Align All Modalities with Languageā651Updated 9 months ago
- Emu Series: Generative Multimodal Models from BAAIā1,739Updated 10 months ago
- Autoregressive Model Beats Diffusion: š¦ Llama for Scalable Image Generationā1,816Updated 11 months ago
- Efficient Multimodal Large Language Models: A Surveyā362Updated 3 months ago
- LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)ā824Updated last year
- [ICLR 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.ā1,625Updated this week
- PyTorch implementation of RCG https://arxiv.org/abs/2312.03701ā918Updated 10 months ago
- Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understandingā626Updated 7 months ago
- Next-Token Prediction is All You Needā2,173Updated 4 months ago
- Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAIā1,182Updated last month
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"ā861Updated 2 months ago
- Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"ā928Updated last year
- [TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.ā1,854Updated 3 months ago
- This repo contains the code for 1D tokenizer and generatorā963Updated 4 months ago
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizerā230Updated last year