[CVPR 2026] Scaling Spatial Intelligence with Multimodal Foundation Models
☆231Apr 29, 2026Updated last week
Alternatives and similar repositories for SenseNova-SI
Users that are interested in SenseNova-SI are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆20Apr 14, 2026Updated 3 weeks ago
- [ICLR 2026] SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models☆92Mar 9, 2026Updated last month
- Visual Spatial Tuning☆198Mar 25, 2026Updated last month
- The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'☆231Nov 28, 2025Updated 5 months ago
- In our implementation of Qwen-Image-Edit, we employ block causal attention to improve inference speed.☆50Feb 16, 2026Updated 2 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling☆462Apr 16, 2026Updated 3 weeks ago
- LEO: A powerful Hybrid Multimodal LLM☆20Jan 18, 2025Updated last year
- ☆149Mar 23, 2026Updated last month
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆88Mar 9, 2026Updated last month
- UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation☆136Jun 10, 2025Updated 10 months ago
- A python script for downloading huggingface datasets and models.☆20Apr 10, 2025Updated last year
- [NeurIPS 2025] InternScenes: A Large-scale Interactive Indoor Scene Dataset with Realistic Layouts.☆245Oct 17, 2025Updated 6 months ago
- A benchmark that focuses on the sampling dilemma in long-video tasks. Through well-designed tasks, it evaluates the sampling efficiency o…☆28Aug 7, 2025Updated 8 months ago
- Dynamic 3D Foundation Model using Causal Transformer. [ICLR 2026]☆355Mar 26, 2026Updated last month
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- open-sourced video dataset with dynamic scenes and camera movements annotation☆91Apr 24, 2025Updated last year
- [ICLR 2025] Diffusion²: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models☆56Mar 18, 2025Updated last year
- [NeurIPS 2025 Spotlight] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence☆462Feb 5, 2026Updated 3 months ago
- Official implementation of "RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics"☆72Jan 19, 2026Updated 3 months ago
- [CVPR 2026] SpatialVID: A Large-Scale Video Dataset with Spatial Annotations☆547Apr 22, 2026Updated 2 weeks ago
- [CVPR 2024] GeoAuxNet: Torwards Universal 3D Representation Learning for Multi-sensor Point Clouds☆18Mar 29, 2024Updated 2 years ago
- Official Repo for Self-Forcing++ High Quality Long Video Generation☆255Oct 13, 2025Updated 6 months ago
- Official PyTorch implementation for "Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas", presenting the Merge-Att…☆14Jul 9, 2025Updated 9 months ago
- Pi0-VLA Repository of "MotionTrans: Human VR Data Enable Motion-Level Learning for Robotic Manipulation Policies"☆27Mar 9, 2026Updated last month
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis☆24Sep 26, 2024Updated last year
- Reasoning in Space via Grounding in the World (ICLR 2025)☆52Nov 3, 2025Updated 6 months ago
- Code for "Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views", CVPR 2025☆49Jul 7, 2025Updated 9 months ago
- ☆54Jul 4, 2025Updated 10 months ago
- [ICLR 2026] InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation☆112Jan 27, 2026Updated 3 months ago
- [ICCV 2025 ⭐highlight⭐] Implementation of VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory☆424Jul 25, 2025Updated 9 months ago
- [SIGGRAPH Asia 25] Official code for Neural Texture Splatting: Expressive 3D Gaussian Splatting for View Synthesis, Geometry, and Dynamic…☆35Mar 3, 2026Updated 2 months ago
- [ICCV 2025] SpatialTrackerV2: 3D Point Tracking Made Easy☆951Feb 27, 2026Updated 2 months ago
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆68Mar 22, 2026Updated last month
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- More reliable Video Understanding Evaluation☆15Sep 23, 2025Updated 7 months ago
- [TCSVT‘26] LaSSM: Efficient Semantic-Spatial Query Decoding via Local Aggregation and State Space Models for 3D Instance Segmentation☆23Feb 22, 2026Updated 2 months ago
- Official PyTorch implementation of DiffMoE, TC-DiT, EC-DiT and Dense DiT☆172Oct 21, 2025Updated 6 months ago
- Stable-Sim2Real: Exploring Simulation of Real-Captured 3D Data with Two-Stage Depth Diffusion (ICCV 2025 Highlight)☆29Mar 15, 2026Updated last month
- Official repository for "Vid2World: Crafting Video Diffusion Models to Interactive World Models" (ICLR 2026), https://arxiv.org/abs/2505.…☆54Jan 27, 2026Updated 3 months ago
- REALM: A Real-to-Sim Validated Benchmark for Generalization in Robotic Manipulation☆50Apr 25, 2026Updated last week
- [NeurIPS 2025 Spotlight] Official PyTorch implementation of Vgent☆45Nov 30, 2025Updated 5 months ago