lichao-sun / SoraReviewLinks

The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".

☆497

Alternatives and similar repositories for SoraReview

Users that are interested in SoraReview are comparing it to the libraries listed below

Sorting:

jy0205 / LaVIT
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
☆585Updated 9 months ago
YingqingHe / Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
☆494Updated 3 months ago
LMM101 / Awesome-Multimodal-Next-Token-Prediction
[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
☆446Updated 6 months ago
lucidrains / magvit2-pytorch
Implementation of MagViT2 Tokenizer in Pytorch
☆622Updated 6 months ago
Meituan-AutoML / VisionLLaMA
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
☆386Updated last year
snap-research / Panda-70M
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
☆613Updated 9 months ago
RunpeiDong / DreamLLM
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
☆452Updated 8 months ago
AILab-CVC / SEED
Official implementation of SEED-LLaMA (ICLR 2024).
☆618Updated 10 months ago
kyegomez / NaViT
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
☆249Updated last week
Vchitect / VBench
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
☆1,126Updated last week
TencentARC / SEED-Voken
SEED-Voken: A Series of Powerful Visual Tokenizers
☆920Updated last month
yzhang2016 / video-generation-survey
A reading list of video generation
☆606Updated last week
mira-space / MiraData
Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
☆461Updated 11 months ago
ChaofanTao / Autoregressive-Models-in-Vision-Survey
[TMLR 2025🔥] A survey for the autoregressive models in vision.
☆660Updated last week
soraw-ai / Awesome-Text-to-Video-Generation
A list for Text-to-Video, Image-to-Video works
☆241Updated 2 months ago
csuhan / OneLLM
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
☆651Updated 9 months ago
baaivision / Emu
Emu Series: Generative Multimodal Models from BAAI
☆1,739Updated 10 months ago
FoundationVision / LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
☆1,816Updated 11 months ago
swordlidev / Efficient-Multimodal-LLMs-Survey
Efficient Multimodal Large Language Models: A Survey
☆362Updated 3 months ago
dvlab-research / LLaMA-VID
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
☆824Updated last year
showlab / Show-o
[ICLR 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
☆1,625Updated this week
LTH14 / rcg
PyTorch implementation of RCG https://arxiv.org/abs/2312.03701
☆918Updated 10 months ago
Vision-CAIR / MiniGPT4-video
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
☆626Updated 7 months ago
baaivision / Emu3
Next-Token Prediction is All You Need
☆2,173Updated 4 months ago
lucidrains / transfusion-pytorch
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
☆1,182Updated last month
eric-ai-lab / MiniGPT-5
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
☆861Updated 2 months ago
willisma / SiT
Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
☆928Updated last year
Vchitect / Latte
[TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.
☆1,854Updated 3 months ago
bytedance / 1d-tokenizer
This repo contains the code for 1D tokenizer and generator
☆963Updated 4 months ago
OpenGVLab / MM-Interleaved
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
☆230Updated last year