lichao-sun / SoraReviewLinks
The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".
ā505Updated last year
Alternatives and similar repositories for SoraReview
Users that are interested in SoraReview are comparing it to the libraries listed below
Sorting:
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentā599Updated last year
- š„š„š„ A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).ā526Updated 8 months ago
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Surveyā462Updated 10 months ago
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationā460Updated last year
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasksā390Updated last year
- Implementation of MagViT2 Tokenizer in Pytorchā654Updated 11 months ago
- Official implementation of SEED-LLaMA (ICLR 2024).ā636Updated last year
- A reading list of video generationā640Updated last week
- SEED-Voken: A Series of Powerful Visual Tokenizersā983Updated 2 weeks ago
- [CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachersā653Updated last year
- My implementation of "Patch nā Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"ā269Updated last month
- A list for Text-to-Video, Image-to-Video worksā249Updated 6 months ago
- Emu Series: Generative Multimodal Models from BAAIā1,760Updated last year
- [CVPR 2024] OneLLM: One Framework to Align All Modalities with Languageā665Updated last year
- Next-Token Prediction is All You Needā2,261Updated 3 weeks ago
- Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"ā494Updated last year
- [CVPR2024 Highlight] VBench - We Evaluate Video Generationā1,364Updated this week
- MiniSora: A community aims to explore the implementation path and future development direction of Sora.ā1,271Updated 9 months ago
- Efficient Multimodal Large Language Models: A Surveyā376Updated 7 months ago
- Autoregressive Model Beats Diffusion: š¦ Llama for Scalable Image Generationā1,907Updated last year
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"ā863Updated 7 months ago
- ā632Updated last year
- LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)ā851Updated last year
- Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"ā1,037Updated last month
- A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".ā1,071Updated 2 years ago
- Papers and resources on Controllable Generation using Diffusion Models, including ControlNet, DreamBooth, IP-Adapter.ā497Updated 5 months ago
- PyTorch implementation of RCG https://arxiv.org/abs/2312.03701ā935Updated last year
- [TMLR 2025š„] A survey for the autoregressive models in vision.ā758Updated last month
- Scaling Diffusion Transformers with Mixture of Expertsā408Updated last year
- [TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.ā1,896Updated last month