WorldModelBench-Team / WorldModelBenchLinks
☆31Updated 6 months ago
Alternatives and similar repositories for WorldModelBench
Users that are interested in WorldModelBench are comparing it to the libraries listed below
Sorting:
- ☆63Updated last month
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆41Updated 11 months ago
- Official repository for "Vid2World: Crafting Video Diffusion Models to Interactive World Models" (ICLR 2026), https://arxiv.org/abs/2505.…☆35Updated last week
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆207Updated 3 months ago
- Visual Spatial Tuning☆171Updated this week
- Official Repo of From Masks to Worlds: A Hitchhiker’s Guide to World Models.☆71Updated 3 months ago
- [ICLR 2025] Implementation of Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding☆48Updated 9 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆95Updated 11 months ago
- ☆52Updated last year
- UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation☆46Updated 5 months ago
- A list of works on video generation towards world model☆334Updated this week
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆172Updated last month
- Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders☆188Updated last week
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆167Updated 3 months ago
- Official codebase for "Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Gene…☆165Updated this week
- VideoNSA: Native Sparse Attention Scales Video Understanding☆79Updated 2 months ago
- [ICLR 2026] Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆207Updated last week
- Towards Scalable Pre-training of Visual Tokenizers for Generation☆437Updated last month
- This repository provides the official implementation of VTBench, a benchmark designed to evaluate the performance of visual tokenizers (V…☆34Updated 6 months ago
- Official repository of PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning☆57Updated 3 months ago
- [NeurIPS 2025 Oral] Official Code for Exploring Diffusion Transformer Designs via Grafting☆70Updated 3 weeks ago
- ☆117Updated 2 weeks ago
- [NeurIPS 2025] VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models☆157Updated last month
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆117Updated last week
- [ICLR 2026] 🐻 Uniform Discrete Diffusion with Metric Path for Video Generation☆98Updated 3 weeks ago
- Official respository for ReasonGen-R1☆74Updated 7 months ago
- [ICCV 2025] The official implementation of "Neighboring Autoregressive Modeling for Efficient Visual Generation"☆58Updated 10 months ago
- E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models☆33Updated last month
- Scaling Spatial Intelligence with Multimodal Foundation Models☆160Updated 3 weeks ago
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…☆30Updated last year