[CVPR 2026] Scaling Spatial Intelligence with Multimodal Foundation Models
☆272May 14, 2026Updated last month
Alternatives and similar repositories for SenseNova-SI
Users that are interested in SenseNova-SI are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆21Apr 14, 2026Updated 2 months ago
- [ICLR 2026] SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models☆96Jun 9, 2026Updated last week
- Visual Spatial Tuning☆197Mar 25, 2026Updated 2 months ago
- The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'☆239Nov 28, 2025Updated 6 months ago
- [ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling☆481Apr 16, 2026Updated 2 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- LEO: A powerful Hybrid Multimodal LLM☆20Jan 18, 2025Updated last year
- ☆158Mar 23, 2026Updated 2 months ago
- UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation☆136Jun 10, 2025Updated last year
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆97Mar 9, 2026Updated 3 months ago
- A python script for downloading huggingface datasets and models.☆20Apr 10, 2025Updated last year
- A benchmark that focuses on the sampling dilemma in long-video tasks. Through well-designed tasks, it evaluates the sampling efficiency o…☆28Aug 7, 2025Updated 10 months ago
- [NeurIPS 2025] InternScenes: A Large-scale Interactive Indoor Scene Dataset with Realistic Layouts.☆251May 15, 2026Updated last month
- Dynamic 3D Foundation Model using Causal Transformer. [ICLR 2026]☆379May 8, 2026Updated last month
- open-sourced video dataset with dynamic scenes and camera movements annotation☆93Apr 24, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [ICLR 2025] Diffusion²: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models☆58Mar 18, 2025Updated last year
- [CVPR 2026] SpatialVID: A Large-Scale Video Dataset with Spatial Annotations☆570Apr 22, 2026Updated last month
- [CVPR 2024] GeoAuxNet: Torwards Universal 3D Representation Learning for Multi-sensor Point Clouds☆18Mar 29, 2024Updated 2 years ago
- Official Repo for Self-Forcing++ High Quality Long Video Generation☆257Oct 13, 2025Updated 8 months ago
- Official implementation of "RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics"☆74Jan 19, 2026Updated 4 months ago
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.☆215Jun 4, 2025Updated last year
- Pi0-VLA Repository of "MotionTrans: Human VR Data Enable Motion-Level Learning for Robotic Manipulation Policies"☆27Mar 9, 2026Updated 3 months ago
- XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis☆25Sep 26, 2024Updated last year
- Reasoning in Space via Grounding in the World (ICLR 2025)☆55Nov 3, 2025Updated 7 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Code for "Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views", CVPR 2025☆51Jul 7, 2025Updated 11 months ago
- ☆59Jul 4, 2025Updated 11 months ago
- Official PyTorch implementation for "Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas", presenting the Merge-Att…☆15Jul 9, 2025Updated 11 months ago
- [ICLR 2026] InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation☆114Jan 27, 2026Updated 4 months ago
- [ICCV 2025 ⭐highlight⭐] Implementation of VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory☆437Jul 25, 2025Updated 10 months ago
- [ACM'MM 2025] UAV Street-Satellite matching workshop Challenging paper, SkyLink: Unifying Street-Satellite Geo-Localization via UAV-Media…☆25Dec 9, 2025Updated 6 months ago
- [ICCV 2025] SpatialTrackerV2: 3D Point Tracking Made Easy☆967Feb 27, 2026Updated 3 months ago
- This repo contains the code for the paper "Object-cropping for SSL".☆18Feb 14, 2023Updated 3 years ago
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆68Mar 22, 2026Updated 2 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation [TMLR26]☆17Jun 1, 2026Updated 2 weeks ago
- The official implementation of "NAS-BNN: Neural Architecture Search for Binary Neural Networks"☆14Aug 30, 2024Updated last year
- [CVPR 2026 (Oral)] MAGICIAN: Efficient Long-Term Planning with Imagined Gaussians for Active Mapping☆150May 27, 2026Updated 2 weeks ago
- ☆214Oct 22, 2025Updated 7 months ago
- Stable-Sim2Real: Exploring Simulation of Real-Captured 3D Data with Two-Stage Depth Diffusion (ICCV 2025 Highlight)☆31Mar 15, 2026Updated 3 months ago
- [Arxiv 2025] Official PyTorch implementation of DiffMoE, TC-DiT, EC-DiT and Dense DiT☆177Oct 21, 2025Updated 7 months ago
- Official repository for "Vid2World: Crafting Video Diffusion Models to Interactive World Models" (ICLR 2026), https://arxiv.org/abs/2505.…☆66Jan 27, 2026Updated 4 months ago