InternRobotics / G2VLMLinks
G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
☆258Updated 3 weeks ago
Alternatives and similar repositories for G2VLM
Users that are interested in G2VLM are comparing it to the libraries listed below
Sorting:
- This is the repository that contains source code for the PhysGen3D.☆240Updated 4 months ago
- SAM 3D Objects with Multi-view Images☆194Updated 2 months ago
- [Official] AstraNav-Memory: Contexts Compression for Long Memory. An image-centric memory framework for lifelong embodied navigation via …☆29Updated 2 weeks ago
- ☆140Updated 10 months ago
- 🌐 3D and 4D World Modeling: A Survey☆793Updated 3 weeks ago
- [NeurIPS 25] TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels☆180Updated last month
- [ICRA 2026] A Unified Driving World Model for Future Generation and Perception☆136Updated 6 months ago
- 4DNeX: Feed-Forward 4D Generative Modeling Made Easy☆818Updated last month
- Official Implementation of Paper [DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation]☆74Updated last month
- [NeurIPS'25] Official repository of Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations☆497Updated 2 months ago
- Official implementation of "Next-Scale Autoregressive Models are Zero-Shot Single-Image Object View Synthesizers"☆45Updated 10 months ago
- [CVPR2024] Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion☆137Updated last year
- [NeurIPS 2025 DB Track] 3EED: Ground Everything Everywhere in 3D☆200Updated last month
- RynnEC: Bringing MLLMs into Embodied World☆384Updated 3 months ago
- Are Video Models Ready as Zero-shot Reasoners?☆84Updated 2 months ago
- [ICLR 2026] Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation☆378Updated last week
- [CORL 2025 Oral]One View, Many Worlds: Single-Image to 3D Object Meets Generative Domain Randomization for One-Shot 6D Pose Estimation.☆445Updated 5 months ago
- Official code of Motus: A Unified Latent Action World Model☆616Updated last month
- [ICRA 2025] PUGS: Zero-shot Physical Understanding with Gaussian Splatting.☆104Updated 10 months ago
- [NeurIPS 2025 Spotlight] Towards Understanding Camera Motions in Any Video☆269Updated 2 months ago
- [ICLR 2026] NewtonGen: Physics-Consistent and Controllable Text-to-Video Generation via Neural Newtonian Dynamics☆120Updated last week
- OmniNWM: Omniscient Navigation World Models for Autonomous Driving☆272Updated 3 months ago
- [CVPR 2025, All Strong Accept] TSP3D: Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding☆249Updated 7 months ago
- [SIGGRAPH Conference 2024] GaussianPrediction: Dynamic 3D Gaussian Prediction for Motion Extrapolation and Free View Synthesis☆157Updated 10 months ago
- [ECCV2024] DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling☆229Updated 2 months ago
- [NeurIPS 2025 Spotlight] A Native Multimodal LLM for 3D Generation and Understanding☆539Updated 3 months ago
- Official implementation of "ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation"☆85Updated last month
- [AAAI 2026 🔥] Official implementation of "NeuralGS: Bridging Neural Fields and 3D Gaussian Splatting for Compact 3D Representation"☆176Updated 5 months ago
- ICCV 2025 | TesserAct: Learning 4D Embodied World Models☆379Updated 6 months ago
- [ICCV 2025] DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation☆186Updated 5 months ago