[CVPR2026] Scaling Spatial Intelligence with Multimodal Foundation Models
☆184Mar 19, 2026Updated last week
Alternatives and similar repositories for SenseNova-SI
Users that are interested in SenseNova-SI are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR 2026] SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models☆80Mar 9, 2026Updated 2 weeks ago
- ☆20Oct 15, 2025Updated 5 months ago
- Visual Spatial Tuning☆187Mar 17, 2026Updated last week
- The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'☆213Nov 28, 2025Updated 3 months ago
- [ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling☆444Feb 25, 2026Updated last month
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆135Nov 1, 2025Updated 4 months ago
- LEO: A powerful Hybrid Multimodal LLM☆20Jan 18, 2025Updated last year
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆85Mar 9, 2026Updated 2 weeks ago
- UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation☆136Jun 10, 2025Updated 9 months ago
- A python script for downloading huggingface datasets and models.☆20Apr 10, 2025Updated 11 months ago
- [NeurIPS 2025] InternScenes: A Large-scale Interactive Indoor Scene Dataset with Realistic Layouts.☆234Oct 17, 2025Updated 5 months ago
- A benchmark that focuses on the sampling dilemma in long-video tasks. Through well-designed tasks, it evaluates the sampling efficiency o…☆27Aug 7, 2025Updated 7 months ago
- open-sourced video dataset with dynamic scenes and camera movements annotation☆87Apr 24, 2025Updated 11 months ago
- Dynamic 3D Foundation Model using Causal Transformer. [ICLR 2026]☆318Mar 9, 2026Updated 2 weeks ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [ICLR 2025] Diffusion²: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models☆56Mar 18, 2025Updated last year
- Official implementation of "RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics"☆68Jan 19, 2026Updated 2 months ago
- [CVPR 2026] SpatialVID: A Large-Scale Video Dataset with Spatial Annotations☆521Mar 1, 2026Updated 3 weeks ago
- [CVPR 2024] GeoAuxNet: Torwards Universal 3D Representation Learning for Multi-sensor Point Clouds☆18Mar 29, 2024Updated last year
- Code for "Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views", CVPR 2025☆47Jul 7, 2025Updated 8 months ago
- Official PyTorch implementation for "Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas", presenting the Merge-Att…☆14Jul 9, 2025Updated 8 months ago
- XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis☆23Sep 26, 2024Updated last year
- Pi0-VLA Repository of "MotionTrans: Human VR Data Enable Motion-Level Learning for Robotic Manipulation Policies"☆27Mar 9, 2026Updated 2 weeks ago
- Reasoning in Space via Grounding in the World (ICLR 2025)☆50Nov 3, 2025Updated 4 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- [ICLR 2026] InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation☆108Jan 27, 2026Updated 2 months ago
- ☆51Jul 4, 2025Updated 8 months ago
- [ICCV 2025 ⭐highlight⭐] Implementation of VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory☆418Jul 25, 2025Updated 8 months ago
- [ICCV 2025] SpatialTrackerV2: 3D Point Tracking Made Easy☆932Feb 27, 2026Updated 3 weeks ago
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆66Updated this week
- [NeurIPS 2025 Spotlight] Official PyTorch implementation of Vgent☆42Nov 30, 2025Updated 3 months ago
- Spatial Aptitude Training for Multimodal Langauge Models☆25Feb 8, 2026Updated last month
- ☆203Oct 22, 2025Updated 5 months ago
- More reliable Video Understanding Evaluation☆14Sep 23, 2025Updated 6 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Official PyTorch implementation of DiffMoE, TC-DiT, EC-DiT and Dense DiT☆168Oct 21, 2025Updated 5 months ago
- Stable-Sim2Real: Exploring Simulation of Real-Captured 3D Data with Two-Stage Depth Diffusion (ICCV 2025 Highlight)☆29Mar 15, 2026Updated last week
- Official repository for "Vid2World: Crafting Video Diffusion Models to Interactive World Models" (ICLR 2026), https://arxiv.org/abs/2505.…☆47Jan 27, 2026Updated last month
- REALM: A Real-to-Sim Validated Benchmark for Generalization in Robotic Manipulation☆50Mar 18, 2026Updated last week
- A Large-scale Video Action Dataset☆438Jan 16, 2026Updated 2 months ago
- [CVPR 2026] ZipMap: Linear-Time Stateful 3D Reconstruction via Test-Time Training☆312Mar 6, 2026Updated 2 weeks ago
- [CVPR26] MuM's a pretty good feature extractor for 3D tasks, probably the best.☆70Mar 4, 2026Updated 3 weeks ago