ByteDance-Seed / VideoWorld
[CVPR 2025] VideoWorld is a simple generative model that learns purely from unlabeled videos—much like how babies learn by observing their environment.
☆548Updated last month
Alternatives and similar repositories for VideoWorld:
Users that are interested in VideoWorld are comparing it to the libraries listed below
- ☆292Updated last month
- CogView4, CogView3-Plus and CogView3(ECCV 2024)☆1,003Updated 3 weeks ago
- ☆844Updated last month
- Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…☆356Updated this week
- MineWorld: A Real-time interactive world model on Minecraft☆270Updated last week
- SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers☆491Updated this week
- [CVPR2024 Highlight] VBench - We Evaluate Video Generation☆928Updated this week
- Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]☆469Updated this week
- [TPAMI under review] The official implementation of paper "BrushEdit: All-In-One Image Inpainting and Editing"☆549Updated 4 months ago
- ☆225Updated 2 months ago
- 💡 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning☆176Updated last week
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistant☆267Updated last month
- [CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis☆1,200Updated 2 months ago
- Multimodal Models in Real World☆493Updated 2 months ago
- R1-onevision, a visual language model capable of deep CoT reasoning.☆506Updated last week
- Frontier Multimodal Foundation Models for Image and Video Understanding☆751Updated last week
- 🔥🔥First-ever hour scale video understanding models☆286Updated last week
- Any-length Video Inpainting and Editing with Plug-and-Play Context Control☆340Updated 2 weeks ago
- [AAAI 2025] StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization☆207Updated last week
- ☆369Updated last month
- Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities☆757Updated this week
- [CVPR 2025 Highlight] X-Dyna: Expressive Dynamic Human Image Animation☆236Updated 2 months ago
- ☆311Updated 4 months ago
- Explore the Multimodal “Aha Moment” on 2B Model☆577Updated last month
- A Unified Tokenizer for Visual Generation and Understanding☆262Updated last week
- [ARXIV'25] GameFactory: Creating New Games with Generative Interactive Videos☆281Updated last month
- [ICLR 2025] Autoregressive Video Generation without Vector Quantization☆477Updated this week
- FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation☆423Updated last month
- Next-Token Prediction is All You Need☆2,099Updated last month
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆2,634Updated last week