[NeurIPS'25] SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
☆39Oct 14, 2025Updated 5 months ago
Alternatives and similar repositories for SSR
Users that are interested in SSR are comparing it to the libraries listed below
Sorting:
- ☆23May 8, 2025Updated 10 months ago
- Modality Gap–Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models☆53Feb 23, 2026Updated 3 weeks ago
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆35Jun 7, 2024Updated last year
- ☆19Sep 10, 2025Updated 6 months ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- Official code for paper Advantage Weighted Matching: Aligning RL with Pretraining in Diffusion Models☆59Jan 16, 2026Updated 2 months ago
- ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation. AAAI, 2025☆13Aug 25, 2025Updated 6 months ago
- ☆93May 31, 2025Updated 9 months ago
- Official code for the paper: DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models☆24Jan 6, 2026Updated 2 months ago
- VR-based Robot Teleoperation and Data Collection System for Humanoid Whole-Body VLA (Unitree G1)☆164Feb 17, 2026Updated last month
- [NeurIPS 2025] Official repository for “FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Models”☆30Dec 9, 2025Updated 3 months ago
- Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".☆24Oct 22, 2025Updated 5 months ago
- ☆16Dec 25, 2025Updated 2 months ago
- 🔥 The first open-sourced diffusion vision-langauge-action model. [ICLR 2026]☆164Mar 12, 2026Updated last week
- An unofficial pytorch dataloader for Open X-Embodiment Datasets https://github.com/google-deepmind/open_x_embodiment☆24Jan 9, 2025Updated last year
- [EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokens☆25Nov 6, 2023Updated 2 years ago
- ☆13Jan 22, 2025Updated last year
- [AAAI 2026] Official code for MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Man…☆64Jul 31, 2025Updated 7 months ago
- Code implementation of the paper 'FIction: 4D Future Interaction Prediction from Video'☆18Mar 19, 2025Updated last year
- Official Implementation of Towards Open Vocabulary Video Semantic Segmentation☆14Feb 27, 2025Updated last year
- ROS landmark list publisher that was converted from apriltag_ros/AprilTagDetectionArray for cartographer_ros landmark topic☆10Nov 18, 2021Updated 4 years ago
- [CVPR 2026] HiF-VLA: An efficient, bidirectional spatiotemporal expansion Vision-Language-Action Model☆50Mar 11, 2026Updated last week
- [NeurIPS 2025] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence☆451Feb 5, 2026Updated last month
- Official implementation of paper "Vision Graph Prompting via Semantic Low-Rank Decomposition", ICML 2025☆16Dec 25, 2025Updated 2 months ago
- The official implementation of Diffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene Completion [AAAI'2…☆16Feb 2, 2026Updated last month
- ☆10Nov 28, 2023Updated 2 years ago
- Codebase for "Towards Generalizable Safety in Crowd Navigation via Conformal Uncertainty Handling" [CoRL 2025].☆29Jan 9, 2026Updated 2 months ago
- Official implementation of paper "GAPrompt: Geometry-Aware Point Cloud Prompt for 3D Vision Model", ICML 2025☆15Dec 25, 2025Updated 2 months ago
- ☆17May 13, 2025Updated 10 months ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆47Nov 10, 2024Updated last year
- [ICCV 2023 & IJCV 2026] PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection☆22Aug 12, 2024Updated last year
- [AAAI26] LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs☆55Dec 7, 2025Updated 3 months ago
- library to finetune VLAs☆41Feb 7, 2026Updated last month
- Code for "SCL-RAI: Span-based Contrastive Learning with Retrieval Augmented Inference for Unlabeled Entity Problem in NER" @COLING-2022☆11Aug 20, 2022Updated 3 years ago
- [ICLR 2026] Official repository of "Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models"☆163Feb 16, 2026Updated last month
- Official code for "Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation" (ICLR2026)☆126Mar 3, 2026Updated 2 weeks ago
- [CVPR 2025] Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning☆56Apr 1, 2025Updated 11 months ago
- Spirit-v1.5: A Robotic Foundation Model by Spirit AI☆536Jan 14, 2026Updated 2 months ago
- DiWA: Diffusion Policy Adaptation with World Models☆74Mar 4, 2026Updated 2 weeks ago