Wolfv0 / FoundationMotionLinks
☆119Updated 3 weeks ago
Alternatives and similar repositories for FoundationMotion
Users that are interested in FoundationMotion are comparing it to the libraries listed below
Sorting:
- ☆65Updated last month
- Visual Spatial Tuning☆172Updated last week
- [Arxiv'25] MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization☆56Updated 4 months ago
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆167Updated 4 months ago
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆61Updated last year
- 4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)☆117Updated 10 months ago
- (3DV 2026 Oral) L4P -- a feed-forward foundational model designed for multiple low-level 4D vision perception tasks.☆57Updated 2 months ago
- [Nips 2025] EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆127Updated 6 months ago
- Unifying 2D and 3D Vision-Language Understanding☆121Updated 6 months ago
- Code implementation of the paper "World-in-World: World Models in a Closed-Loop World" (ICLR'26 Oral)☆124Updated last month
- Scaling Spatial Intelligence with Multimodal Foundation Models☆170Updated this week
- From Flatland to Space (SPAR). Accepted to NeurIPS 2025 Datasets & Benchmarks. A large-scale dataset & benchmark for 3D spatial perceptio…☆77Updated last month
- [NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"☆128Updated 3 months ago
- Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces☆87Updated 8 months ago
- ROOT: VLM based System for Indoor Scene Understanding and Beyond☆39Updated last year
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆95Updated 11 months ago
- [EMNLP 2025 Findings] 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation☆30Updated 8 months ago
- SPAgent, a spatial intelligence agent designed to operate in the physical and spatial world.☆98Updated 2 weeks ago
- Sora Generates Videos with Stunning Geometrical Consistency☆51Updated last year
- [NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding☆100Updated last year
- FleVRS: Towards Flexible Visual Relationship Segmentation, NeurIPS 2024☆22Updated last year
- ☆65Updated 2 months ago
- [CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding☆129Updated 8 months ago
- [CVPR 2024] Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding☆62Updated last year
- [ICLR 2026] OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆79Updated 3 weeks ago
- A Large-scale Video Action Dataset☆388Updated 3 weeks ago
- Code for our paper: Learning Camera Movement Control from Real-World Drone Videos☆34Updated 9 months ago
- Official Implementation of "Geometrically-Constrained Agent for Spatial Reasoning"☆49Updated last month
- ☆41Updated 8 months ago
- ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Grounding☆17Updated 6 months ago