Wolfv0 / FoundationMotionLinks
☆119Updated 3 weeks ago
Alternatives and similar repositories for FoundationMotion
Users that are interested in FoundationMotion are comparing it to the libraries listed below
Sorting:
- Visual Spatial Tuning☆172Updated last week
- ☆63Updated last month
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆167Updated 4 months ago
- [Nips 2025] EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆127Updated 6 months ago
- [Arxiv'25] MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization☆56Updated 4 months ago
- Unifying 2D and 3D Vision-Language Understanding☆121Updated 6 months ago
- Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces☆87Updated 8 months ago
- (3DV 2026 Oral) L4P -- a feed-forward foundational model designed for multiple low-level 4D vision perception tasks.☆56Updated 2 months ago
- Code implementation of the paper "World-in-World: World Models in a Closed-Loop World" (ICLR'26 Oral)☆124Updated last month
- [NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"☆126Updated 3 months ago
- [NeurIPS 2025]《SD-VLM: Spatial Measuring and Understanding with Depth-encoded Vision Language Models》☆33Updated last month
- ☆65Updated 2 months ago
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆61Updated last year
- From Flatland to Space (SPAR). Accepted to NeurIPS 2025 Datasets & Benchmarks. A large-scale dataset & benchmark for 3D spatial perceptio…☆77Updated last month
- 4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)☆117Updated 10 months ago
- Scaling Spatial Intelligence with Multimodal Foundation Models☆170Updated this week
- Official Repo of From Masks to Worlds: A Hitchhiker’s Guide to World Models.☆73Updated 3 months ago
- ROOT: VLM based System for Indoor Scene Understanding and Beyond☆39Updated last year
- [ICLR 2026] OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆79Updated 3 weeks ago
- ☆38Updated last month
- ☆41Updated 8 months ago
- [EMNLP 2025 Findings] 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation☆30Updated 8 months ago
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆83Updated 3 weeks ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆95Updated 11 months ago
- SPAgent, a spatial intelligence agent designed to operate in the physical and spatial world.☆98Updated 2 weeks ago
- SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding☆60Updated 7 months ago
- Scaling Properties of Diffusion Models For Perceptual Tasks (CVPR 2025)☆44Updated 9 months ago
- [CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding☆129Updated 8 months ago
- Sora Generates Videos with Stunning Geometrical Consistency☆51Updated last year
- Official code for paper: N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models☆85Updated 3 weeks ago