manycore-research / SpatialLM
SpatialLM: Large Language Model for Spatial Understanding
☆2,563Updated this week
Alternatives and similar repositories for SpatialLM:
Users that are interested in SpatialLM are comparing it to the libraries listed below
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,513Updated this week
- [CVPR 2025] MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors☆1,686Updated 3 weeks ago
- ☆2,753Updated last week
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,071Updated last week
- A suite of image and video neural tokenizers☆1,590Updated last month
- DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding☆968Updated this week
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆4,284Updated 5 months ago
- Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accele…☆7,847Updated this week
- OpenVLA: An open-source vision-language-action model for robotic manipulation.☆2,362Updated last week
- Witness the aha moment of VLM with less than $3.☆3,430Updated last month
- YOLOE: Real-Time Seeing Anything☆945Updated last week
- Stable Virtual Camera: Generative View Synthesis with Diffusion Models☆1,015Updated this week
- Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"☆6,655Updated 2 weeks ago
- RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO & designed for fine-tuning.☆1,422Updated this week
- SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images☆744Updated last month
- The repo for "Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator"☆475Updated this week
- LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning☆1,919Updated 2 months ago
- Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources☆1,549Updated 2 weeks ago
- [CVPR'25] MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision☆840Updated last week
- Grounding Image Matching in 3D with MASt3R☆1,975Updated 2 months ago
- Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2☆1,903Updated last week
- [ARXIV'25] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video☆755Updated this week
- Official Implementation of paper "MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion"☆1,090Updated last week
- [CVPR 2025] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos☆793Updated 2 weeks ago
- Train your AI self, amplify you, bridge the world☆6,291Updated this week
- 4M: Massively Multimodal Masked Modeling☆1,701Updated 3 weeks ago
- DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos☆1,240Updated last month
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL☆2,529Updated this week
- The best OSS video generation models☆3,056Updated 2 months ago
- Solve Visual Understanding with Reinforced VLMs☆4,400Updated last week