ByteDance-Seed / VideoWorld
[CVPR 2025] VideoWorld is a simple generative model that learns purely from unlabeled videos—much like how babies learn by observing their environment.
☆561Updated 2 months ago
Alternatives and similar repositories for VideoWorld
Users that are interested in VideoWorld are comparing it to the libraries listed below
Sorting:
- 🔥🔥First-ever hour scale video understanding models☆323Updated 3 weeks ago
- Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…☆529Updated this week
- Multimodal Models in Real World☆503Updated 2 months ago
- Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…☆369Updated 3 weeks ago
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistant☆278Updated last month
- CogView4, CogView3-Plus and CogView3(ECCV 2024)☆1,025Updated last month
- LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)☆194Updated this week
- Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]☆515Updated this week
- Frontier Multimodal Foundation Models for Image and Video Understanding☆795Updated last month
- 💡 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning☆191Updated 3 weeks ago
- ☆228Updated 2 months ago
- [ICML 2025] Official PyTorch implementation of LongVU☆370Updated last week
- Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,065Updated last week
- ☆310Updated 5 months ago
- Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"☆439Updated 8 months ago
- ☆173Updated 3 months ago
- Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities☆847Updated 3 weeks ago
- [CVPR2024 Highlight] VBench - We Evaluate Video Generation☆990Updated this week
- [ARXIV'25] GameFactory: Creating New Games with Generative Interactive Videos☆288Updated last month
- Next-Token Prediction is All You Need☆2,121Updated 2 months ago
- R1-onevision, a visual language model capable of deep CoT reasoning.☆515Updated last month
- This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"☆181Updated 4 months ago
- [AAAI 2025] StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization☆211Updated last month
- SEED-Story: Multimodal Long Story Generation with Large Language Model☆844Updated 7 months ago
- [SIGGRAPH2025] Official repo for paper "Any-length Video Inpainting and Editing with Plug-and-Play Context Control"☆364Updated last month
- Explore the Multimodal “Aha Moment” on 2B Model☆586Updated last month
- ☆151Updated this week
- A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gem…☆1,191Updated this week
- VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling☆411Updated this week
- A Unified Tokenizer for Visual Generation and Understanding☆290Updated last week