[CVPR2026] Scaling Spatial Intelligence with Multimodal Foundation Models
☆198Apr 10, 2026Updated this week
Alternatives and similar repositories for SenseNova-SI
Users that are interested in SenseNova-SI are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR 2026] SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models☆83Mar 9, 2026Updated last month
- ☆20Oct 15, 2025Updated 6 months ago
- Visual Spatial Tuning☆193Mar 25, 2026Updated 3 weeks ago
- The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'☆224Nov 28, 2025Updated 4 months ago
- [ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling☆456Mar 25, 2026Updated 3 weeks ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- LEO: A powerful Hybrid Multimodal LLM☆20Jan 18, 2025Updated last year
- ☆142Mar 23, 2026Updated 3 weeks ago
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆87Mar 9, 2026Updated last month
- UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation☆136Jun 10, 2025Updated 10 months ago
- A python script for downloading huggingface datasets and models.☆20Apr 10, 2025Updated last year
- [NeurIPS 2025] InternScenes: A Large-scale Interactive Indoor Scene Dataset with Realistic Layouts.☆238Oct 17, 2025Updated 5 months ago
- A benchmark that focuses on the sampling dilemma in long-video tasks. Through well-designed tasks, it evaluates the sampling efficiency o…☆28Aug 7, 2025Updated 8 months ago
- Dynamic 3D Foundation Model using Causal Transformer. [ICLR 2026]☆325Mar 26, 2026Updated 3 weeks ago
- open-sourced video dataset with dynamic scenes and camera movements annotation☆90Apr 24, 2025Updated 11 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ICLR 2025] Diffusion²: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models☆56Mar 18, 2025Updated last year
- [CVPR 2026] SpatialVID: A Large-Scale Video Dataset with Spatial Annotations☆536Apr 9, 2026Updated last week
- [NeurIPS 2025] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence☆457Feb 5, 2026Updated 2 months ago
- Official implementation of "RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics"☆71Jan 19, 2026Updated 2 months ago
- [CVPR 2024] GeoAuxNet: Torwards Universal 3D Representation Learning for Multi-sensor Point Clouds☆18Mar 29, 2024Updated 2 years ago
- Official PyTorch implementation for "Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas", presenting the Merge-Att…☆14Jul 9, 2025Updated 9 months ago
- Pi0-VLA Repository of "MotionTrans: Human VR Data Enable Motion-Level Learning for Robotic Manipulation Policies"☆27Mar 9, 2026Updated last month
- Code for "Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views", CVPR 2025☆48Jul 7, 2025Updated 9 months ago
- XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis☆23Sep 26, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Reasoning in Space via Grounding in the World (ICLR 2025)☆52Nov 3, 2025Updated 5 months ago
- ☆52Jul 4, 2025Updated 9 months ago
- [ICLR 2026] NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction☆58Updated this week
- [ICLR 2026] InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation☆112Jan 27, 2026Updated 2 months ago
- [ICCV 2025 ⭐highlight⭐] Implementation of VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory☆422Jul 25, 2025Updated 8 months ago
- [ICCV 2025] SpatialTrackerV2: 3D Point Tracking Made Easy☆939Feb 27, 2026Updated last month
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆67Mar 22, 2026Updated 3 weeks ago
- This repo contains the code for the paper "Object-cropping for SSL".☆18Feb 14, 2023Updated 3 years ago
- More reliable Video Understanding Evaluation☆15Sep 23, 2025Updated 6 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆205Oct 22, 2025Updated 5 months ago
- Official PyTorch implementation of DiffMoE, TC-DiT, EC-DiT and Dense DiT☆172Oct 21, 2025Updated 5 months ago
- Stable-Sim2Real: Exploring Simulation of Real-Captured 3D Data with Two-Stage Depth Diffusion (ICCV 2025 Highlight)☆29Mar 15, 2026Updated last month
- Official repository for "Vid2World: Crafting Video Diffusion Models to Interactive World Models" (ICLR 2026), https://arxiv.org/abs/2505.…☆51Jan 27, 2026Updated 2 months ago
- REALM: A Real-to-Sim Validated Benchmark for Generalization in Robotic Manipulation☆49Updated this week
- [NeurIPS 2025 Spotlight] Official PyTorch implementation of Vgent☆45Nov 30, 2025Updated 4 months ago
- A Large-scale Video Action Dataset☆444Jan 16, 2026Updated 2 months ago