facebookresearch / Action100MLinks
A Large-scale Video Action Dataset
☆388Updated 3 weeks ago
Alternatives and similar repositories for Action100M
Users that are interested in Action100M are comparing it to the libraries listed below
Sorting:
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆167Updated 4 months ago
- [ICLR 2026] Astra : General Interactive World Model with Autoregressive Denoising"☆220Updated last week
- A list of works on video generation towards world model☆337Updated this week
- ☆163Updated last year
- Cambrian-S: Towards Spatial Supersensing in Video☆488Updated last month
- [NeurIPS 2025] The official repository of "Sekai: A Video Dataset towards World Exploration"☆255Updated last month
- Official Repo of From Masks to Worlds: A Hitchhiker’s Guide to World Models.☆73Updated 3 months ago
- [Nips 2025] EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆127Updated 6 months ago
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆79Updated this week
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆209Updated 4 months ago
- [NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"☆126Updated 3 months ago
- Code for the Molmo2 Vision-Language Model☆151Updated last month
- Scaling Spatial Intelligence with Multimodal Foundation Models☆170Updated this week
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆95Updated 11 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆203Updated 9 months ago
- E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models☆33Updated last month
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆64Updated 2 weeks ago
- ☆184Updated last week
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆139Updated 5 months ago
- Official repo for UAE☆164Updated last month
- [ICLR’26] Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control☆94Updated 7 months ago
- [ICLR 2026] 🐻 Uniform Discrete Diffusion with Metric Path for Video Generation☆102Updated this week
- Code implementation of the paper "World-in-World: World Models in a Closed-Loop World" (ICLR'26 Oral)☆124Updated last month
- ☆63Updated last month
- [ICLR 2026] Trace Anything: Representing Any Video in 4D via Trajectory Fields☆489Updated 3 months ago
- DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models☆169Updated last month
- [ICLR26] Official implementation of Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling☆144Updated 2 weeks ago
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆62Updated 9 months ago
- Official repository of PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning☆57Updated 3 months ago
- [NeurIPS 2025] VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models☆162Updated last month