yliu-cs/SSR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yliu-cs/SSR)

yliu-cs / SSR

[NeurIPS'25] SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

☆40

Alternatives and similar repositories for SSR

Users that are interested in SSR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yliu-cs / MMaDA-VLA
View on GitHub
[ACM MM'26] MMaDA-VLA: Large Diffusion Vision-Language-Action Model with Unified Multi-Modal Instruction and Generation
☆63May 14, 2026Updated 2 months ago
BofangJia / SDM-Policy
View on GitHub
Score and Distribution Matching Policy: Advanced accelerated Visuomotor Policies via matched distillation
☆11May 9, 2025Updated last year
ErikZ719 / CoTA
View on GitHub
[ICLR 26] Context Tokens are Anchors: Understanding the Repeat Curse in dMLLMs from an Information Flow Perspective
☆16Mar 6, 2026Updated 4 months ago
Wu0409 / Antidote
View on GitHub
[CVPR'25] Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
☆20Oct 11, 2025Updated 9 months ago
OpenHelix-Team / VLA-RFT
View on GitHub
VLA-RFT: Vision-Language-Action Models with Reinforcement Fine-Tuning
☆162Oct 6, 2025Updated 9 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
johnson111788 / SpatialReasoner
View on GitHub
Training recipe for SpatialReasoner [NeurIPS 2025]
☆45Apr 5, 2026Updated 3 months ago
MINT-SJTU / STI-Bench
View on GitHub
STI-Bench : Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
☆39Jan 12, 2026Updated 6 months ago
OpenGVLab / De-focus-Attention-Networks
View on GitHub
Learning 1D Causal Visual Representation with De-focus Attention Networks
☆35Jun 7, 2024Updated 2 years ago
ShijieZhou-UCLA / VLM4D
View on GitHub
[ICCV 2025] VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
☆55Nov 20, 2025Updated 8 months ago
sled-group / COMFORT
View on GitHub
[ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Un…
☆22Oct 24, 2024Updated last year
OpenHelix-Team / OpenHelix
View on GitHub
OpenHelix: An Open-source Dual-System VLA Model for Robotic Manipulation
☆388Aug 27, 2025Updated 10 months ago
Cognition2Action-Lab / VLA-TMEE
View on GitHub
Reshaping Action Error Distributions for Reliable Vision-Language-Action Models
☆17Feb 5, 2026Updated 5 months ago
Jingtong0527 / RobuRCDet
View on GitHub
☆18Sep 10, 2025Updated 10 months ago
Yangr116 / VST
View on GitHub
[ECCV2026] Visual Spatial Tuning
☆200Mar 25, 2026Updated 3 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
yliu-cs / PiTe
View on GitHub
[ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model
☆17Feb 13, 2025Updated last year
jiachengliu3 / OpenWBC
View on GitHub
VR-based Robot Teleoperation and Data Collection System for Humanoid Whole-Body VLA (Unitree G1)
☆171Feb 17, 2026Updated 5 months ago
Git-HB-CHEN / MOFO
View on GitHub
Multi-Organ Foundation Model for Universal Ultrasound Image Segmentation with Task Prompt and Anatomical Prior
☆15Sep 30, 2024Updated last year
OpenHelix-Team / HiF-VLA
View on GitHub
[CVPR 2026] HiF-VLA: An efficient, bidirectional spatiotemporal expansion Vision-Language-Action Model
☆75Mar 11, 2026Updated 4 months ago
zhoujiahuan1991 / ICML2025-TCPA
View on GitHub
☆23May 8, 2025Updated last year
OpenHelix-Team / Spatial-Forcing
View on GitHub
Official implementation of Spatial-Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model [ICLR2026]
☆268Jul 7, 2026Updated 2 weeks ago
Yifan-Song793 / InfoCL
View on GitHub
Findings of EMNLP 2023: InfoCL: Alleviating Catastrophic Forgetting in Continual Text Classification from An Information Theoretic Perspe…
☆14Aug 13, 2024Updated last year
LoveJu1y / LaRA-VLA
View on GitHub
[ICML 2026] Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models
☆80May 18, 2026Updated 2 months ago
LaVi-Lab / VG-LLM
View on GitHub
The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'
☆246Nov 28, 2025Updated 7 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
haoningwu3639 / SpatialScore
View on GitHub
[CVPR 2026 Highlight] SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence
☆84May 28, 2026Updated last month
Vegetebird / CA-MLLM
View on GitHub
[ICLR 2026] Official implementation of the paper "📷 On the Generalization Capacities of MLLMs for Spatial Intelligence"
☆29Mar 17, 2026Updated 4 months ago
yu2hi13 / P2SAM
View on GitHub
Official implementation for P2SAM (ACM MM 2024)
☆14Dec 7, 2024Updated last year
KuanchihHuang / Reason3D
View on GitHub
[3DV 2025] Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
☆124May 30, 2025Updated last year
Yifan-Song793 / GoodBadGreedy
View on GitHub
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism
☆31Jul 17, 2024Updated 2 years ago
THU-SI / Spatial-MLLM
View on GitHub
[NeurIPS 2025 Spotlight] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
☆480Feb 5, 2026Updated 5 months ago
ZhefeiGong / carp
View on GitHub
[ICCV2025] Official code repository of "CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction"
☆61Aug 10, 2025Updated 11 months ago
OpenHelix-Team / frappe
View on GitHub
Official implementation of FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment
☆55Mar 24, 2026Updated 4 months ago
BRZ911 / ViTCoT
View on GitHub
[ACM MM 2025] ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models
☆18Jul 15, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Hon-Wong / ByteVideoLLM
View on GitHub
[ICCV 2025] Dynamic-VLM
☆28Dec 16, 2024Updated last year
InternLM / Spatial-SSRL
View on GitHub
[CVPR 2026] Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"
☆133Apr 7, 2026Updated 3 months ago
ricl-vla / ricl_openpi
View on GitHub
☆67Aug 7, 2025Updated 11 months ago
ZJU-REAL / SpatialLadder
View on GitHub
[ICLR 2026] SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models
☆99Jun 9, 2026Updated last month
80chen86 / IPDN
View on GitHub
☆17Dec 25, 2025Updated 6 months ago
ShenXinda / ORBSLAM2_Semantic_Mapping
View on GitHub
Semantic mapping based on pixel level classification.
☆23Feb 1, 2021Updated 5 years ago
Qualcomm-AI-research / skip-attention
View on GitHub
☆21May 7, 2024Updated 2 years ago