facebookresearch / Action100MLinks
A Large-scale Video Action Dataset
☆388Updated 3 weeks ago
Alternatives and similar repositories for Action100M
Users that are interested in Action100M are comparing it to the libraries listed below
Sorting:
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆167Updated 4 months ago
- [ICLR 2026] Astra : General Interactive World Model with Autoregressive Denoising"☆220Updated last week
- A list of works on video generation towards world model☆337Updated this week
- [Nips 2025] EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆127Updated 6 months ago
- ☆163Updated last year
- Cambrian-S: Towards Spatial Supersensing in Video☆492Updated last month
- [NeurIPS 2025] The official repository of "Sekai: A Video Dataset towards World Exploration"☆258Updated last month
- [NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"☆128Updated 3 months ago
- Official Repo of From Masks to Worlds: A Hitchhiker’s Guide to World Models.☆73Updated 3 months ago
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆79Updated this week
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆209Updated 4 months ago
- [ICLR’26] Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control☆94Updated 7 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆203Updated 9 months ago
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆69Updated 2 weeks ago
- [ICLR 2026] Trace Anything: Representing Any Video in 4D via Trajectory Fields☆501Updated 3 months ago
- Code implementation of the paper "World-in-World: World Models in a Closed-Loop World" (ICLR'26 Oral)☆124Updated last month
- [ICLR 2026] 🐻 Uniform Discrete Diffusion with Metric Path for Video Generation☆102Updated this week
- ☆65Updated last month
- [ICLR26] Official implementation of Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling☆144Updated 2 weeks ago
- DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models☆169Updated last month
- Official code for paper: N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models☆85Updated 3 weeks ago
- [NeurIPS 2025] WorldMem: Long-term Consistent World Simulation with Memory☆334Updated last month
- ☆184Updated last week
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆95Updated 11 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆139Updated 5 months ago
- ☆124Updated 3 months ago
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆62Updated 9 months ago
- [ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling☆427Updated last month
- Generative World Explorer☆165Updated 7 months ago
- Code for the Molmo2 Vision-Language Model☆158Updated last month