Wolfv0 / FoundationMotionLinks
☆119Updated 3 weeks ago
Alternatives and similar repositories for FoundationMotion
Users that are interested in FoundationMotion are comparing it to the libraries listed below
Sorting:
- Visual Spatial Tuning☆172Updated last week
- ☆63Updated last month
- 4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)☆117Updated 10 months ago
- [Arxiv'25] MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization☆56Updated 4 months ago
- Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces☆87Updated 8 months ago
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆167Updated 4 months ago
- [Nips 2025] EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆127Updated 6 months ago
- Scaling Spatial Intelligence with Multimodal Foundation Models☆170Updated this week
- Code implementation of the paper "World-in-World: World Models in a Closed-Loop World"☆124Updated last month
- [CVPR 2025] Program synthesis for 3D spatial reasoning☆56Updated 7 months ago
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆61Updated last year
- Unifying 2D and 3D Vision-Language Understanding☆121Updated 6 months ago
- ☆41Updated 8 months ago
- ☆65Updated 2 months ago
- (3DV 2026 Oral) L4P -- a feed-forward foundational model designed for multiple low-level 4D vision perception tasks.☆56Updated 2 months ago
- From Flatland to Space (SPAR). Accepted to NeurIPS 2025 Datasets & Benchmarks. A large-scale dataset & benchmark for 3D spatial perceptio…☆77Updated last month
- FleVRS: Towards Flexible Visual Relationship Segmentation, NeurIPS 2024☆22Updated last year
- ROOT: VLM based System for Indoor Scene Understanding and Beyond☆39Updated last year
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆95Updated 11 months ago
- A Large-scale Video Action Dataset☆388Updated 3 weeks ago
- [NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"☆126Updated 3 months ago
- [AAAI 2026] Official implementation of the paper ”SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D F…☆24Updated last month
- [ICCV 2025] Improving 3D Large Language Model via Robust Instruction Tuning☆68Updated 3 months ago
- [ICLR 2026] OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆79Updated 3 weeks ago
- [CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding☆129Updated 8 months ago
- Scaling Properties of Diffusion Models For Perceptual Tasks (CVPR 2025)☆44Updated 9 months ago
- Sora Generates Videos with Stunning Geometrical Consistency☆51Updated last year
- Dream-VL and Dream-VLA, a diffusion VLM and a diffusion VLA.☆101Updated 3 weeks ago
- Official Repo of From Masks to Worlds: A Hitchhiker’s Guide to World Models.☆73Updated 3 months ago
- [NeurIPS 2025]《SD-VLM: Spatial Measuring and Understanding with Depth-encoded Vision Language Models》☆33Updated last month